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Preface 


Chapter 1: Building Deep Learning Environment 


This chapter provides all the foundational knowledge for the rest of the book.We describe how to 
setup the deep learning environment used for developing the following projects, on the industry 
standard AWS infrastructure, using GPU-powered instances and the Deep Learning AMI. We 
also explain how to evaluate the performance of a deep learning model, and how to understand 
the origin of its shortcomings. 


Chapter 2: Training Neural Network for Prediction using Regression 


In this chapter, we introduce perceptron, weights, single layer training, multi-layer training, we 
are also going to discuss about fundamental concepts in deep learning. The learner would be able 
to develop first deep neural network as an example, the user will also build understanding about 
gradient descent and back propagation methods of building neural networks. 


Chapter 3: Word Vector Representation using Word2VEC 


In this chapter, we introduce the foundational knowledge about deep learning for computational 
linguistic. We present the role of dense vector representation of words in various computational 
linguistic tasks, and how to construct them from an unlabelled monolingual corpus. We present 
the role of language models in various computational linguistic tasks, and how to construct them 
from unlabelled monolingual corpus using Recurrent Neural Networks (RNN). 


Chapter 4: Build NLP pipeline for Open-Domain Question Answering 


In this chapter, we build a system able to read an unstructured text, to comprehend an answer for 
a specific question.We also describe how to include this deep learning component in a classic 
NLP pipeline for information retrieval, to provide an open-domain question answering system 
not requiring a structure knowledge base. 


Chapter 5: Sequence-to-sequence models for building chatbots 


Build a system that answers geography questions, translating them to a query language targeting 
a DB, using a sequence to sequence mode with attention mechanism. We describe the limitation 
of having to model the entity resolution of every entity mention, and how to separate this task 
using the copy mechanism. In this chapter, we will also cover how we can transfer knowledge 
about natural language understanding from one semantic parsing task to another. 


Chapter 6: Generative Language modellingusing Bi-LSTM 


In this chapter, we build a system that generates rich product reviews and descriptions, starting 
from a short meaning representation sequence, containing structured information about the 
product. We describe the problem of semantic repetitions, and how to solve it adding a dialog act 
cell aware of the generation history. 


Chapter 7: Building Speech Recognition with DeepSpeech2 

In this chapter, we build a system that recognize English speech, using the DeepSpeech2 model 
Chapter 8: Handwritten digits classification using ConvNets 

In this chapter, we introduce the foundational knowledge about deep learning for computer 
vision. We introduce Convolution Neural Networks, Max Pooling, and Residual 

Connections. We use these components for building a model capable of classifying handwritten 
digits. 

Chapter 9: Real-time Object Detection using OpenCV and TensorFlow 


In this chapter, we present one of the most important uses of vision systems: detecting and 
recognizing objects. 


Chapter 10: Building Face Recognition using OpenFace and Clustering 


In this chapter, we build a face recognition system capable of identifying a person from a digital 
image. 


Chapter 11: Semantic Labeling of an image using Pixel-Level clustering and Depth Layering 
In this chapter, we build a system for multi-class pixel-wise segmentation of an image. 
Chapter 12: Automated Image Captioning with NeuralTalk model 


In this chapter, we build a system that generates natural language descriptions of images and 
their regions, using the NeuralTalk model. 


Chapter 13: Pose Estimation on 3D models using ConvNets 


In this chapter, we build a system that estimates the 3D pose of a human using a new Pose 
Estimation method based on Convolutional Neural Networks (CNN). The first step is to create 
synthetic images of the object simulating a camera located at different points around it. The CNN 
is pre-trained with these thousands of synthetic imagesof the object model. 


Chapter 14: Image translation using GANs for style transfer 


In this chapter, we build a system performing Image-to-Image translation, learning the mapping 
between an input image and an output image, using a training set of aligned image pairs, 
generating photos from paintings, colorization of black and white Images, performing style 
transfer, and more. 


Chapter 15: Develop anautonomous Agents with Deep Reinforcement Learning 
In this chapter, we develop an agent playing autonomously the game of breakout. Starting 


directly from the raw vision input, this agent learns its own behavior policy, using deep 
reinforcement learning, without any hand-engineered features or domain heuristics. 


Chapter 16: Next Steps in your Deep Learning Career 


In this concluding chapter, we summarize key skills taught throughout the book and integrate 
concepts to unify the learner’s understanding of Deep Learning technologies. Core to this 
chapter is the expression of congratulations on the work the reader has done in executing the 
projects and giving them the confidence to take their learnings out into the world. 


Building Deep Learning Environment 


Welcome to the Applied AI Deep Learning team and to our first project - Building a Common 
Deep Learning Environment! We're excited about the projects we've assembled in this book. 
The foundation of a common working environment will help us work together and learn very 
cool and powerful Deep Learning technologies like computer vision and natural language 
processing that you will be able to use in your professional career as a data scientist. 


The following topics will be covered in the chapter: 


. Components in building a common deep learning environment 

. Setting up a local deep learning environment 

. Setting up a deep learning environment in the cloud 

. Using the cloud for deployment of deep learning applications 

. Automating this process to reduce errors and get started quickly 
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Building a Common Deep Learning 
Environment 


Our main goal to complete by the end of the chapter would be to standardize the toolset to work 
together and achieve consistent accurate results. 


In the process of building applications using Deep Learning (DL) algorithms which can also 
scale for production, it's very important to have a right kind of setup on local/cloud to make 
things work end to end. So in this chapter, we will learn how to setup a DL environment which 
we will be using to run all the experiments and finally take the AI models into production. 


STRATEGY TIP: First, we will discuss the major components which are required to code, 
build and deploy the DL models, then various ways to do it and finally few code snippets which 
will help to automate the whole process. 


Here is the list of required components which we need to build DL applications : 


Ubuntu 16.04 or greater 

Anaconda Package 

Python 2.x/3.x 

Deep Learning packages: Tensorflow/ Keras 
CUDA for GPU support 

Gunicorn for deployment at scale 


Get Focused and into the code! 


We'll start by setting up your local Deep Learning environment. Much of the work that you'll do 
can be done on local machines. But with large datasets and complex model architectures, 
processing time slows down dramatically. This is why we are also setting up a Deep Learning 
environment in the cloud because the processing time for these complex and repetitive 
calculations just becomes too long to be able to efficiently get things done otherwise. 


We will work straight through the list above and by the end (and with the help of a bit of 
automated script) you'll have everything set up! 


Deep Learning environment setup in local 


Throughout this book, we will be using Ubuntu OS to run all the experiments because there is a 
great community support for Linux and mostly any DL application can be setup easily on Linux. 
For any assistance on installation and setup related to Ubuntu please refer these tutorials 
(https://tutorials.ubuntu.com/). On top of that, this book will use Anaconda package with Python 
2.7+ to write our code, train, and test. Anaconda comes with a huge list of pre-installed python 
packages such as numpy, pandas, sklearn etc. which are commonly used in any data science 
project. 


Question: Why do we need Anaconda? Can we use vanilla python? 


Since Anaconda is a generic bundle which contains iPython notebook, editor and lots of python 
libraries preinstalled. So it saves time to setup everything and one can quickly start on solving 
the Data Science problem instead of configuring the environment. 

Yes, you can use the default python its totally readers choice and we will learn at the end of this 


chapter on how to config python env using script". 


Download and Install Anaconda 


Anaconda is a very popular data science platform for people using Python to build machine 
learning and deep learning models and deployable applications. The Anaconda marketing team 
put it best in their "What is Anaconda" page (https://www.anaconda.com/what-is-anaconda/). To 
install Anaconda follow the given steps: 


e Click “Anaconda” from the menu and click “Download” to go to the download page 
(https://www.continuum.io/downloads). 
e Choose the download suitable for your platform (Linux, OSX or Windows): 
o Choose Python 3.6 version* 
o Choose the Graphical Installer 
e Follow the instructions on the wizard and in 10 -20 mins you will be ready with Python 
setup. 


Once the installation process is completed you can use following command to check the python 
version in your Linux terminal: 


python -vV 
You should see output like: 


Python 2.7 :: Anaconda, Inc. 


If the commands do not work or have an error, please check the documentation for help for your 
platform. 


Installing Deep Learning Libraries 


Now we will install Python libraries used for Deep Learning, specifically: TensorFlow, and 
Keras. 


Question: What is TensorFlow? 

TensorFlow is a python library developed and maintained by Google. You can implement many 
powerful machine learning and Deep Learning architectures in custom models and applications 
using Tensorflow. To know more visit https://www.tensorflow.org/. 


Install the TensorFlow deep learning library (all except Windows) by typing: 


conda install -c conda-forge tensorflow 


Alternatively, you may choose to install using pip and a specific version of TensorFlow for your 
platform. 


pip install tensorflow==1.6 


See the installation instructions for TensorFlow 
(https://www.tensorflow.org/get_started/os_setup#anaconda installation). 


Now we will install keras using the following command: 


pip install keras 


To validate the environment and the version of the packages, create the following script which 
will print the version numbers of each library. 


# Import the tensorflow library 
import tensorflow 

# Import the keras library 
import keras 


print('tensorflow: %s' % tensorflow.__version_) 
print('keras: %s' % keras.__version_) 


Save the script to a file a1_versions.py. Run the script by typing: 


python dl_version.py 


You should see output like: 


tensorflow: 1.6.0 
Using TensorFlow backend. 
keras: 2.1.5 


Voila !!! Now we are ready with Python development environment to write awesome Deep 
Learning applications in our local. 


Deep Learning environment setup in the 
cloud 


All the steps we performed till now remains same for the cloud as well. But there are few 
additional modules required to configure the cloud virtual machines to make your DL 
applications servable and scalable. So before setting up your server follow the instructions from 
the above section. 


To deploy your Deep Learning applications in the cloud you will need a server which is good 
enough to train your models and serve at the same time, So with the huge development in Deep 
Learning space, the need for cloud servers to practice and deploy the projects has increased 
drastically and so the options in the market. Here is the list of best options which one can opt 
for: 


. PaperSpace (https://www.paperspace.com/) 


. FloydHub (https://www.floydhub.com) 
. Amazon Web Services (https://aws.amazon.com/) 


. Google Cloud Platform (https://cloud.google.com/) 
. Digital Ocean (https://cloud.digitalocean.com/) 
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All the previously-mentioned options have some pro and cons and it totally depends on your use- 
case and preferences. So feel free to explore more. In this book, we will build and deploy our 
models mostly on Google Compute Engine (GCE) which is a part of Google Cloud Platform 
(GCP), you follow these steps to spin up a VM server and get started: 


Google has released an internal notebook platform, Google 

Colab (https://colab.research.google.com/), which is pre-installed with all the Deep Learning 
packages and other python libraries. You can write all your ML/DL application on the Google 
cloud leveraging free GPUs for 10 running hours. 


Cloud platforms for Deployment 


The main idea behind this book is to empower you to build and deploy Deep Learning 
applications. In this section, we will discuss some critical components which are required to 
make your applications accessible to million users. 


The best way to make your application accessible is to expose it as a web service using REST or 
SOAP APIs. To do so we have many python web framework such as web.py, Flask, bottle, and 
many more. These frameworks allow us to easily build the web services and deploy it. 


Prerequisites 


You should have a Google Cloud (https://cloud.google.com/) account. Google is promoting 
usage of their platform right now and is giving away $300 dollars of credit and 12 months as a 
free tier user. 


Setup the GCP 


Follow the given instructions to set up your Google Cloud Platform: 


1. Creating a new project: Click on the three dots shown in the following image and then 
click on the + sign to create a new project. 


= Google Cloud Platform $e Jatana Scalable Infrastru... 





2. Spinning a VM instance: Click on the three lines on the upper left corner of the screen, 
select the compute option, click on ‘Compute Engine’. Now choose ‘Create new instance’. 
Name the VM instance, select your zone as ‘ us-west2b’. Choose the ‘machine type’ size. 


Choose your boot disk as ‘Ubuntu 16.04 LTS’. In firewall, options choose both ‘http’ and 
‘https’ option (it's important to make it accessible from outer world). To opt for GPU 
options, you can click on ‘customize’ button, and find the GPU options. You can choose 
between 2 NVIDIA GPUs. Check both "Allow HTTP traffic" and "Allow HTTPS traffic" 


Now click on ‘Create’. Boom your new VM is getting ready. 


3. Modify the Firewall settings: Now, click on the ‘Firewall rules’ setting under 
Networking. Under protocols and ports, we need to select a port which we will use to 
export our APIs. We have chosen tcp:s8ose as our port number. Now click on the save 
button. This will assign a rule in the firewall of your VM to access the applications from 
the external world. 


4. Boot your VM: Now start your VM instance. When you see the green tick click on SSH. 
This will open a command window and now you are inside the VM. You can also use 
gcloud cli to login and access your VMs. 


5. Then follow the same steps which we performed to setup the local or read further to create 
an automation script which will perform all the setup automatically. 


Now we need a web framework to write our DL applications as web services and there are again 
lots of option but to make it simple we will be using the combination of web.py and Gunicorn. 


If you want to know which web framework to choose based on memory consumption, CPU 
utilization, etc. you can have a look at this comprehensive benchmarks http://klen.github.io/py- 
frameworks-bench 

Let's install them using following commands: 


pip install web.py 
pip install gunicorn 


Now we are ready to deploy our Deep Learning solution as web services and scale it to the 
production level. 


Automating the setup process 


Installation of python packages and DL libraries can be a tedious process which requires lots of 
time and repetitive effort. So to ease the job we will create a bash script which can be used to 
install everything using a single command. 


List of components which will get installed and configured: 


Java 8 

Bazel for building 

Python and associated deps. 

TensorFlow 

Keras 

Git 

Unzip 

Dependencies for all of the above. See script for exact details. 


You can simply download the automation script to your server or local, Execute it and DONE. 
Steps to follow: 


1. 


De 


3. 


Save the script to your home directory, by cloning the code from the repository. 


git clone https://github.com/PacktPublishing/Python-Deep-Learning-Projects 


Once you have the copy of the complete repo, move to the chapter 1 folder which will 
have a script file named setuppeepLearning.sh. This is the script which we will execute to 
start the setup process but before execution, we will have to make it executable using chmod 
command. 


cd Python-Deep-Learning-Projects/Chapter\ 1/ 


chmod +x setupDeepLearning.sh 


Once done we are ready to execute it as 


./setupDeepLearning.sh 


Follow any instructions that appear (basically say yes to everything and accept Java license). 
This will take about 10-15 mins to install everything. Once it has finished you will see the list of 
python packages get installed (as shown in Figure 1) . 


absl-py (0.1.13) 


lbackports.weakref (1.0.post1) 


scikit-learn (0.19.1) 
scipy (1.0.1) 
setuptools (20.7.0) 
fjaiore (il. Iho} fo) 

sklearn (0.0) 
tensorboard (1.7.0) 


tensorflow (1.7.0) Sj 





Figure 1: Listed packages with Tensorflow and other python deps 


There are a couple of other option too like getting Docker images of TensorFlow and other DL 
packages which can setup fully functional Deep Learning machines for large-scale and 
production ready environments. You can know more about dockers here 

( ). Also for a quick start follow the instructions on this 
repository All-in-one Docker image for deep learning ( , 


Summary 


In this chapter, we worked to set the team set up in a common environment with a standardized 
toolset. We are looking to deploy our project applications utilizing Gunicorn and Cuda. Those 
projects will rely on highly advanced and effective Deep Learning libraries such as Tensorflow 
and Keras running in Python 2.x/3.x. We'll write our code using the resources in the Anaconda 
Package and all this will be running on Ubuntu 16.04 or greater OS. 


Now we are all set to perform experiments and deploy our Deep Learning models in production! 
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Training NN for Prediction using Regression 


Introduction 


Welcome to our first proper project in Python Deep Learning! What we'll be doing today is 
building a classifier to solve the problem of identifying specific handwriting samples from a 
dataset of images. We've been asked (in this hypothetical use case) to do this by a restaurant 
chain that has the need to accurately classify handwritten numbers into digits. What they have 
their customers do is write their phone numbers in a simple iPad application. At the time when 
they can be seated, the guest will get a text prompting them to come to see the restaurant's host. 
We need to accurately classify the handwritten numbers so that the output can be the accurately 
predicted labels for the digits of a phone number. This can then be sent to their (hypothetical) 
auto dialer service for text messages and the notice gets to the right hungry customer! 


DEFINE SUCCESS - A good practice is to define the criteria for success at the beginning of a 
project. What metric should we use for this project? Let's use a global test accuracy as a 
percentage to measure our performance in this project. 


The data science approach to the problem of classification can be configured in a number of 
ways. In fact, later in this book, we'll look at how to increase the accuracy in image classification 
with Convolutional Neural Networks. 


TRANSFER LEARNING - Pretraining a deep learning model on a different (but quite similar) 
dataset to speed up the rate of learning and accuracy on another (often smaller) dataset. In this 
project and our hypothetical use case our pretraining of our deep learning multilayer perceptron 
on the MNIST dataset would enable the deployment of a production system of handwriting 
classification without having a huge period of time where we were collecting data samples in a 
live but non-functional system. Python Deep Learning Projects are cool! 


Let's start with the baseline deep neural network model architecture. We will get our intuition 
and skills firmly established and this will prepare us for learning more complex architectures to 
solve a wider variety of problems as we go progress through the projects in this book. 


What we'll learn in this chapter is: 


. What is a Multilayer Perceptron? 

Explore a common open source handwriting dataset - The MNIST dataset 
Build our intuition and preparations for model architecture 

Code the model and define hyperparameters 

. Build the training loop 

. Test the model! 
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Building a regression model for prediction 
using a multilayer perceptron - A deep 
neural network 


In any real job working in an AI team, one of the primary goals will be to build regression 
models which can make predictions in non-linear datasets. Because of the complexity of the real 
world and the data that you'll be working with, simple linear regression models won't provide 
you the predictive power you're seeking. This is why in this chapter, we will discuss on how to 
build a world-class predictions models using Multilayer Perceptron (MLPs) more information 


can be found at http://www.deeplearningbook.org/contents/mlp.html: 





Figure 2.1: Multilayer Perceptron (MLP) with 2 hidden layers 


23 


We will implement a neural network with a simple architecture of only 2 layers using 
Tensorflow that will perform regression on the MNIST dataset 
(http://yann.lecun.com/exdb/mnist/) which we'll provide. We can (and will) go deeper in 
architecture in later projects! We assume that you are already familiar with backpropagation (if 
not please read article on backpropagation by Michal Nielsen 

at http://neuralnetworksanddeeplearning.com/chap2.html). We'll not spend much time on how 
TensorFlow works, but you can refer to this official tutorial 


(https://www.tensorflow.org/versions/r0.10/get_started/basic_usage.html) if you are interested in 
looking "under the hood" on that technology. 
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Exploring MNIST dataset 


Before we jump on building our awesome neural network, lets first have a look at the famous 
MNIST dataset. So let's visualize the MNIST dataset. 


Words of Wisdom: You must know your data and how it has been pre-processed to know how 
the models you build perform the way they do. This section reviews the significant work that has 
been done in preparation of the dataset to make our current job of building the multilayer 
perceptron (MLP) easier. Always remember: Data Science begins with DATA! 


Let's start therefore by downloading the data: 


from tensorflow.examples.tutorials.mnist import input_data 
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) 


If we examine the mnist variable content. It is structured in a specific format having 3 major 
components: train, test, and validation. Each set has handwritten images and their respective 
labels. The images are stored in a flattened way as a single vector: 


IMAGES 55000 x 784 
TRAIN 
LABELS 55000 x 10 


IMAGES 10000 x 784 





TEST 
LABELS 10000 x 10 


MNIST 


IMAGES 5000 x 784 
VALIDATION 





LABELS 5000 x 10 


TULL 


Figure 2.1: Format of MNIST dataset 
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Let's extract one image from the dataset and plot it. Since the stored shape of a single image 
matrix is [1,784] which we need to reshape into [28,28] to visualize the original image: 


sample_image = mnist.train.images[0].reshape( [28,28] ) 


Once we have the image matrix we will use matplot1ib to plot it as shown as follows: 


import matplotlib.pyplot as plt 


plt.gray() : 
plt.imshow(sample_image) 


Output: 





J J 7 iD eu a 


Figure 2.2: A sample MNIST dataset 


Similar to this image, there are total 55,000 similar images of handwritten digits [0-9]. The labels 
in the mnist dataset are the true value of the digit which is present in the image. Our objective 
then, is to train a model with these set of images and labels so that it can predict the labels of any 
provided image from mnist dataset. 


Be a Deep Learning Explorer: If you are interested to play around with the dataset you can try 
the Colab notebook here (https://drive.google.com/file/d/1- 
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GVlob72EyiJyQpk8EL2fg2mvzaEayJ /view?usp=sharing). 
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Intuition and Preparation 


Let's build our intuition around this project. What we need to do is to build a deep learning 
technology that accurately assigns class labels to an input image. We're using a deep neural 
network known as a Mulitlayer Perceptron to do this. Core to this technology is the mathematics 
of regression. The specific calculus proofs are outside the scope of this book, but in this section 
we provide the basis for your understanding. We also outline the structure of the project so that 
it's easy to understand the primary steps needed to create our desired results. 
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Defining Regression 


Our first task is to define the model which can perform regression on provided MNIST dataset. 
So we will create a TensorFlow model with 2 hidden layers that are in a Fully Connected Neural 
Network. You may also hear it referred to as Multilayer Perceptron. 


The model will perform the operation that will fit the Equation 2.1 where y is the label, x is the 
image, W is the weight which model will learn and b is the bias which will also be learned by the 
model: 


y = nonlinearity(xW + b) 

Equation 2.1: The regression equation for the model 

SUPERVISED LEARNING: When you have data and accurate labels for the training set (i.e. 
you know the answer) you are in a Supervised Deep Learning paradigm. Model training is a 
mathematical process by which the features of the data are learned and associated with the proper 
labels so that when a new data point (test data) is presented the accurate output class label can be 
produced. (i.e. when you present a new data point and do not have the label (don't know the 
answer) your model can produce it for you with a highly reliable class prediction. 


Each iteration will try to generalize the values of weight and bias and reduce the error rate. Also 
keep in mind, that we need to ensure that the model is not overfitting which may lead to wrong 
predictions for the unseen dataset. We'll show you how to code this and visualize the progress to 
aid in your intuition of model performance. 
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Defining project structure 


Let's structure our project as shown in the following pattern: 


. hy_param.py: All the hyperparameters and other configurations are defined here 
. model.py : Definition and architecture of the model are defined here 
. train.py : Code to train the model is written here 
. inference.py: The code to execute the trained model and make predictions are defined here 
. /runs : This folder will store all the checkpoints which get stored during the 
training process 
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You can clone the code from the repository and find the code in chapter 2 folder: 


GITHUB LINK: https://github.com/PacktPublishing/Python-Deep-Learning-Projects/ 
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Let's Code the Implementation! 


To code the implementation we'll start by defining the hyperparameters, then we will define the 
model, followed by building and executing the training loop. We conclude by checking to see if 
our model is overfitting and building an inference code which loads the latest checkpoints and 
then makes predictions on the basis of learned parameters. 
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Defining hyperparameters 


We will define all the required hyperparameters in hy_param.py and then import it as a module in 
our other codes. This makes it easy in deployment and it's a good practice to make your code as 
modular as possible. Let's look into the hyperparameter configurations that we have in 

OUr hy_param. py file: 


1 # Parameters 

» learning_rate = 0.01 
3 num_steps = 100 

| batch_size = 128 

) display_step = l 


3 # Network Parameters 

9 n_hidden_1 = 300 # 1st layer number of neurons 
10 n_hidden_2 = 300 # 2nd layer number of neurons 
11 num_input = 784 # MNIST data input (img shape: 28*28) 
12 num_classes = 10 # MNIST total classes (0-9 digits) 


14 #Tratning Parameters 
15 checkpoint_every = 100 
16 checkpoint_dir = 





We will be using these values throughout our code and it's totally configurable. 


Python Deep Learning Projects Exploration Opportunity! We invite you, our project team-mate 
and reader to try different values of learning rate and number of hidden layers to experiment and 
build the better models. 


Since the flat vectors of images (as shown in Figure 2.1) are of size [1 x 786] the num_input=784 is 


fixed in this case. In addition, the class count in MNIST dataset is 10. We have digits from 0-9, 
So obviously we have num_classes=10. 
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Model Definition 


First, we will load the python modules, in this case, the Tensorflow package and the 
hyperparameters which we have defined previously: 


import tensorflow as tf 
import hy_param 


Then we define the placeholders which we will be using for input data in the 

model. The tf.placeholder allows us to feed input data to the computational graph. We can define 
constraints with the shape of the placeholder to only accept a tensor of a certain shape. Note that 
it is common to provide none for the first dimension, which allows us to the size of the batch at 
runtime. 


Master Your Craft: Batch size can often have a big impact on the performance of Deep 
Learning models. Explore different batch sizes in this project. What changes? What's your 
intuition? Batch size is another tool in your Data Science toolkit! 


We have also assigned names to the placeholders so that we can use them later on while building 
our inference code: 


tf.placeholder("float", [None, hy_param.num_input],name="input_x") 


X 
¥ tf.placeholder("float", [None, hy_param.num_classes],name="input_y 


Now we will define variables which will hold values for weights and bias. The tf.variable allows 
us to store and update Tensors in our graph. To initialize our variables with random values from 
a normal distribution, we will use tf.random_normal() (more details can be found at: 


https://www.tensorflow.org/api_docs/python/tf/random_normal). The important thing to notice 
here is the mapping variable size between layers: 


weights = { 
"hi': tf.Variable(tf.random_normal([hy_param.num_input, hy_param.n_hidden_1])), 
"h2': tf.Variable(tf.random_normal([hy_param.n_hidden_1, hy_param.n_hidden_2]) ) 
‘out': tf.Variable(tf.random_normal([hy_param.n_hidden_2, hy_param.num_classes] 








)) 





biases = { 

'b1': tf.Variable(tf.random_normal([hy_param.n_hidden_1]) ) 
'b2': tf.Variable(tf.random_normal([hy_param.n_hidden_2]) ) 
‘out': tf.Variable(tf.random_normal([hy_param.num_classes] 








)) 


Now let's set up the operation which we defined in Equation 2.1. This is the logistic regression 
operation: 
layer_1 = tf.add(tf.matmul(X, weights['h1']), biases['b1'] 


layer_2 tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) 
logits = tf.matmul(layer_2, weights['out']) + biases['out'] 


The logistic values are converted into the probabilistic values using the tf.nn.softmax(). The 
softmax activation squashes the outputs of each unit to a value between 0 and 1. 


prediction = tf.nn.softmax(logits, name='prediction' ) 
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Next, let's use tf.nn.softmax_cross_entropy_with_logits to define our cost function. We will 
improve performance with optimization using the Adam Optimizer. Finally, we can use the built- 
in minimize() function to calculate the stochastic gradient descent (SGD) update rule for each 
parameter in our network: 








loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y) ) 
optimizer = tf.train.AdamOptimizer (learning_rate=hy_param.learning_rate) 
train_op = optimizer .minimize(loss_op) 


Next, we make the prediction, these functions are needed to calculate and capture the accuracy 
values in a batch 


correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32) ,name='accuracy' ) 
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5 ## Defining Placeholders which will be used as inputs for the model 
. placeholder ( | , .Num_input], ) 
. placeholder ( sal | ; .num_classes], ) 


10 # Defining variables for weights & bias 


.Variable(tf.random_normal( [ .Num_input, .n_hidden_1])), 
Variable(tf.random_normal( [ .n_hidden_1, .n_hidden_2])), 
Variable(tf.random_normal( [ -n_hidden_2, .num_classes] )) 


Variable(tf.random_normal( [ .n_hidden_1])), 
Variable(tf.random_normal( [ .n_hidden_2])), 
Variable(tf.random_normal( [ .num_classes] )) 





23 3 # Hidden fully connected layer 1 with 300 neurons 
.add(tf.matmul(X, [ iy 
25 # Hidden fully connected layer 2 with 300 neurons 
26 .add( tf .matmul( ; hz); ['b2']) 
7 # Output fully connected layer with a neuron for each class 
-matmul ( [ ]) [ | 


30 0 Performing softmax operation 
nn. softmax( 
38 3 # Define loss and optimizer 
.reduce_mean(tf.nn.softmax_cross_entropy_with_Logits( 
)) 
d 
. train. AdamOpt imizer( . learning_rate) 
-minimize( 


39 # Evaluate model 


.equal(tf.argmax( , 1), tf.argmax(Y, 1)) 
.reduce_mean(tf.cast( y thetloats2 a, 





each(as shown in Figure 2.3), which will try to learn the best weight distribution using Adam 


So up till now, we defined the simple 2 hidden layer model architecture with 300 neurons 
Optimizer and predict the probability of 10 classes: 


Hurray!!! The heavy lifting part of the code is done. We save the model code into model. py file. 
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Figure 2.3: An illustration of the model that we created. 
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Build the Training Loop 


The next step is to utilize the model for training and record the learned model parameters, which 
we will accomplish in train.py. 


Let's start with importing the dependencies: 


import tensorflow as tf 
import hy_param 


# MLP Model which we defined in previous step 
import model 


Then we define the variables required to be feed into our MLP: 


his will feed the raw images 
model.X 
his will feed the labels assosiated with the image 


# 
X 
# 
Y model.Y 


T 
T 


Let's create the folder to save the checkpoints. Checkpoints are basically the intermediate steps 
which capture the values of w and »b in the process of learning. Then we will use the 
tf.train.saver() function (find more details 


here: https://www.tensorflow.org/api_docs/python/tf/train/Saver) to save and restore the 


checkpoints: 


checkpoint_dir = os.path.abspath(os.path.join(hy_param.checkpoint_dir, "checkpoints") ) 
checkpoint_prefix = os.path.join(checkpoint_dir, "model") 
if not os.path.exists(checkpoint_dir): 

os.makedirs(checkpoint_dir) 


# We only keep the last 2 checkpoints to manage storage 
saver = tf.train.Saver(tf.global_variables(), max_to_keep=2) 


In order to begin training, we need to create a new session in Tensorflow. In this session we 
initialize the graph variables and feed the model operations the valid data: 


# Initialize the variables 
init = tf.global_variables_initializer() 


# Start training 
with tf.Session() as sess: 


# Run the initializer 
sess.run(init) 


for step in range(1, hy_param.num_steps+1): 
# Extracting 
batch_x, batch_y = mnist.train.next_batch(hy_param.batch_size) 
# Run optimization op (backprop) 
sess.run(model.train_op, feed_dict={X: batch_x, Y: batch_y}) 
if step % hy_param.display_step == © or step == 1: 
# Calculate batch loss and accuracy 
loss, acc = sess.run([model.loss_op, model.accuracy], feed_dict={X: batch_x, 


Y: batch_y}) 
print("Step " + str(step) + ", Minibatch Loss= " + \ 
"{:.4f}".format(loss) + ", Training Accuracy= " + \ 
"{:.3f}". format (acc) ) 
if step % hy_param.checkpoint_every == 0: 


path = saver.save( 
sess, checkpoint_prefix, global_step=step) 
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print("Saved model checkpoint to {}\n".format(path) ) 


print("Optimization Finished!") 


We will extract batches of 128 training image-label pair from the MNIST dataset and feed them 
into the model. After subsequent steps/epochs, we will store the checkpoints using the saver 
operation: 
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3 # Import MNIST data 


.examples.tutorials.mnist 
.read_data_sets( 


fal 


13 ## tf Graph input 
.X 
aif 


.path.abspath(os.path. join( .checkpoint_dir, 
.path. join( ) 
.path.exists( ys 


.makedirs( ) 
.train.Saver(tf.global_variables(), 


25 # Initialize the variables 
26 .global_variables_initializer( ) 


28 # Start training 


.Session( ) 


# Run the initializer 
.run( ) 


rere dave (=) a .num_steps+1): 


# Extracting 
‘ .train.next_batch( .batch_size) 
# Run optimization op (backprop) 
.run( .train_op, if }) 
.display_step 
# Calculate batch loss and accuracy 
.run([ . loss_op, .accuracy], 


5) 


print( str( \\ 


. format ( ) 

. format ( )) 

. checkpoint_every 
.save( 


) ) 
. format ( D) 


print( ) 


# Calculate accuracy for MNIST test images 


print( aN 


.run( .accuracy, { . test. images, 


. test. labels}) ) 





Once we have executed the train. py file, we will see the progress on your console as shown in 
figure 2.3. This depicts the loss getting reduced after every step and accuracy increasing over 


each step. 


Step 68, 
Step 69, 
Step 70, 
Step 71, 
Step 72, 
Step 73, 
Step 74, 
Step 75, 
Step 76, 
Step 77, 
Step 78, 
Step 79, 
Step 80, 
Step 81, 
Step 82, 
Step 83, 
Step 84, 
Step 85, 
Step 86, 
Step 87, 
Step 88, 
Step 89, 
Step 90, 
Step 91, 
Step 92, 
Step 93, 
Step 94, 
Step 95, 
Step 96, 
Step 97, 
Step 98, 
Step 99, 


Minibatch Loss= 306.3990, 
Minibatch Loss= 300.6138, 
Minibatch Loss= 179.9460, 
Minibatch Loss= 228.6768, 
Minibatch Loss= 138.8158, 
Minibatch Loss= 175.2319, 
Minibatch Loss= 231.0065, 
Minibatch Loss= 229, 3589, 
Minibatch Loss= 273.3831, 
Minibatch Loss= 205.3752, 
Minibatch Loss= 203.4548, 
Minibatch Loss= 195.2928, 
Minibatch Loss= 213.9747, 
Minibatch Loss= 284.2013, 
Minibatch Loss= 205.6791, 
Minibatch Loss= 221.5348, 
Minibatch Loss= 106.9436, 
Minibatch Loss= 209.3381, 
Minibatch Loss= 243.1940, 
Minibatch Loss= 161.6641, 
Minibatch Loss= 224.7084, 
Minibatch Loss= 211.0281, 
Minibatch Loss= 146.3944, 


Training Accuracy= @.852 
Training Accuracy= 0.844 
Training Accuracy= 0.867 
Training Accuracy= 0.844 
Training Accuracy= 0.938 
Training Accuracy= 0.906 
Training Accuracy= 0.867 
Training Accuracy= 0.883 
Training Accuracy= 0.891 
Training Accuracy= @.891 
Training Accuracy= 0.844 
Training Accuracy= 0.883 
Training Accuracy= 0.859 
Training Accuracy= 0.836 
Training Accuracy= 0.859 
Training Accuracy= 0.875 
Training Accuracy= 0.875 
Training Accuracy= 0.867 
Training Accuracy= 0.867 
Training Accuracy= @.859 
Training Accuracy= 0.844 
Training Accuracy= @.875 
Training Accuracy= 0.883 


Minibatch Loss= 94.1382, Training Accuracy= 0.898 


Minibatch Loss= 134.1990, 
Minibatch Loss= 232.2369, 
Minibatch Loss= 253.5543, 
Minibatch Loss= 260.2030, 
Minibatch Loss= 155.0190, 
Minibatch Loss= 239.0048, 
Minibatch Loss= 322.0966, 
Minibatch Loss= 167.7812, 


Step 100, Minibatch Loss= 80.1014, 
Optimization Finished! 


Training Accuracy= 0.938 
Training Accuracy= 0.867 
Training Accuracy= 0.898 
Training Accuracy= 0.859 
Training Accuracy= 0.914 
Training Accuracy= 0.828 
Training Accuracy= 0.836 
Training Accuracy= 0.883 
Training Accuracy= 0.914 


Testing Accuracy: 0.8742 


Figure 2.3 Training epochs output with minibatch loss and training accuracy parameters. 
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Also, you can visualize in the plot of loss (Figure 2.4) that it approaching towards minima in 
each step. 


— loss 
3000 
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2000 


1500 7 


1000 7 


500 7 











Steps 
Figure 2.4: Plot of the loss values computed at each step. 


It is very important to visualize how your model is performing so that you can analyze and 
prevent it from underfitting or overfitting. Overfitting is a very common scenario when you are 
dealing with the deeper models. Let's spend some time understanding them in details and few 
tricks to overcome them. 
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Overfitting and Underfitting 


With great power comes great responsibility and with deeper models comes deeper problems. A 
fundamental challenge with deep learning is striking the right balance between generalization 
and optimization. In the deep learning process, we are tuning hyperparamters and often 
continuously configuring and tweaking the model to produce the best results based on the data 
we have for training. This is Optimization. The key question is how well does our model 
generalize in performing predictions on unseen data? 


As professional deep learning engineers, our goal is to build models with good real-world 
generalization. However, generalization is subjective to the model architecture and the training 
dataset. We work to guide our model for maximum utility by reducing the likelihood it learns 
irrelevant patterns or learning simple similar patterns found in the data used for training. If this 
is not done it can affect the generalization process. So a good solution is to provide the model 
with more information that is likely to have a better (more complete and often complex) signal of 
what you're trying to actually model by getting more data to train on and to work to optimize the 
model architecture. Here are few quick tricks which can improve your model by preventing 
overfitting: 


Get more data for training. 

Reduce network capacity by altering the number of layers or nodes. 
Employ L2 (and try L1) weight regularization techniques. 

Add dropout layers or polling layers in the model. 


L1 regularization, where the cost added is proportional to the absolute value of the weights 
coefficients also known as L1 norm. 


L2 regularization, where the cost added is proportional to the square of the value of the weights 
coefficients also known as L2 norm or weight decay. 


When the model gets trained completely its output as checkpoints will get dumped into the folder 


called /runs which will have the binary dump of checkpoints as shown in the following 
screenshot: 
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| hy_param.py IB checkpoints checkpoint 


hy_param,pyc model-100....0-of-00001 
| Inference.py model-100.index 
| model.py model-100.meta 
model.pyc 
~ runs b 
| train.py 


Figure 2.4: Checkpoint folder after training process is completed 
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Building inference 


Now we will create an inference code which loads the latest checkpoints and then makes 
predictions on the basis of learned parameters. For that, we need to create a saver operation 
which will pick the latest checkpoints and load the metadata. Metadata contains the information 
regarding the variables and the nodes that we created in the graph: 

# Pointing the model checkpoint 


checkpoint_file = tf.train.latest_checkpoint(os.path.join(hy_param.checkpoint_dir, 'checkpoints')) 
saver = tf.train.import_meta_graph("{}.meta". format (checkpoint_file) ) 


We know the importance of this because we want to load the similar variables and operations 
back from the stored checkpoint. We load them into memory 

using tf .get_default_graph().get_operation_by_name() by passing the operation name in the param 
which we defined in the model: 


# Load the input variable from the model 
input_x = tf.get_default_graph().get_operation_by_name("input_x").outputs[0] 


# Load the Prediction operation 
prediction = tf.get_default_graph().get_operation_by_name("prediction").outputs[0] 


Now we need to initialize the session and pass data for a test image to the operation that makes 
the prediction: 


# Load the test data 
test_data = np.array([mnist.test.images[0]]) 


with tf.Session() as sess: 
# Restor the model from the checkpoint 
saver.restore(sess, checkpoint_file) 


# Execute the model to make predictions 
data = sess.run(prediction, feed_dict={input_x: test_data }) 


print("Predicted digit: ", data.argmax() ) 
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| import os 
2 import numpy as np 
3 import tensorflow as tf 
1 import matplotlib.pyplot as plt 


6 import hy_param 


8 from tensorflow.examples.tutorials.mnist import input_data 
9 mnist = input_data.read_data_sets( » one_hot=True) 


11 # Pointing the model checkpoint 
2 checkpoint_file = tf.train. latest_checkpoint(os.path. join(hy_param.checkpoint_dir, 
3 saver = tf.train. import_meta_graph( . format(checkpoint_file) ) 


5 # Loading test data 


test_data = np.array([mnist.test.images[0]]) 


8 # Loading input variable from the model 
19 input_x = tf.get_default_graph( ).get_operation_by_name( ). outputs [0] 


| # Loading Prediction operation 


2 prediction = tf.get_default_graph( ).get_operation_by_name( ).outputs[0] 


25 with tf.Session() as sess: 
# Restoring the model from the checkpoint 
saver.restore(sess, checkpoint_file) 


# Executing the model to make predictions 
data = sess.run(prediction, feed_dict={input_x: test_data }) 


print( , data.argmax() ) 





And we are done with our first project which predictions the digit class provided the handwritten 
image! Here are some of the results that model predicted when provided the test image from the 
mnist dataset: 
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Predicted digit: 1 Predicted digit: 4 
Input image: Input image: 





~N 


Predicted digit: Predicted digit: 2 
Input image: Input image: 
0 





Figure 2.4: Output of the model which depicts the prediction of the model and the input image. 
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The conclusion to the project 


Today's project was to build a classifier to solve the problem of identifying specific handwriting 
samples from a dataset of images. Our hypothetical use case was to apply Deep Learning to 
enable customers of a restaurant chain to write their phone numbers in a simple iPad application 
to get a text notification that their party could be seated. Our specific task was to build the 
intelligence that would drive this application. 


Revisit Success Criteria: How did we do? Did we succeed? What is the impact of success? 
Just as we defined success at the beginning of the project, these are the key questions we ask as 
Deep Learning Data Scientists as we look to wrap up a project. 


Our MultiLayer Perceptron model accuracy hit 87.42%! Not bad for depth of the model and the 
hyperparameters we chose at the beginning. See if you can tweak the model to get an even 
higher test set accuracy. 


What are the implications of this accuracy? Let's calculate the incidence of an error occurring 
that would result in a customer service issue (i.e. the customer not getting the text that their table 
is ready and getting upset for an excessively long wait time at the restaurant). 


Each customer's phone number is ten digits long. If our hypothetical restaurant has an average of 
30 tables at each location and those tables turn over two times per night during the rush hour 
when the system is likely to be used, and finally the restaurant chain has 35 locations. This 
means that each day of operation there are approximately 21,000 handwritten numbers captured 
(30 tables x 2 turns/day x 35 locations x 10 digit phone number). 


Obviously, all digits must be correctly classified for the text to get to the proper waiting 
restaurant patron. So any single digit misclassification causes a failure. A model accuracy of 
87.42% would improperly classify 2,642 digits per day in our example. The worst case for the 
hypothetical scenario would be if there occurred only one improperly classified digit in each 
phone number. Since there are only 2,100 patrons and corresponding phone numbers, this would 
mean that every phone number had an error in classification (100% failure) and not a single 
customer would get their text notification that their party could be seated! The best case in this 
scenario would be if all ten digits were misclassified in each phone number and that would result 
in 263 wrong phone numbers out of 2,100 (12.5% failure rate). Still not a level of performance 
the restaurant chain would be likely to be happy with. 


Words of Wisdom: Model performance may not equal system/app performance. Many factors 
contribute to a system being robust or fragile in the real world. Model performance is a key 
factor, but other items with individual fault tolerances definitely play a part. Know how your 
deep learning models integrate into the larger project so that you can set proper expectations! 
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Summary 


In the project in this chapter, we successfully built a MultiLayer Perceptron to produce a 
regression Classification prediction based on handwritten digits. We gained experience with the 
MNIST dataset and a deep neural network model architecture that gave us the added opportunity 
to define some key hyperparameters. Finally, we looked at the model performance in testing and 
determined whether we succeeded in achieving our goals. 


48 


Word representation using Word2VEC 


Our Python Deep Learning Projects team is doing good work and our (hypothetical) business use 
case has expanded! In the last project, we were asked to accurately classify handwritten digits to 
generate a phone number for an available table seating notification text to be sent out to patrons 
of a restaurant chain. What we learned after the project, was that the text that the restaurant sent 
out had a message that was friendly and well received. The restaurant was actually getting texts 
back! 


The notification text was: "We're excited that you're here and your table is ready! See the 
greeter, and we'll seat you and your party now. :)" 


Response texts were varied and usually short, but it was noticed by the greeter and the restaurant 
management and they started thinking that maybe they could use this simple system to get 
feedback on the dining experience. This feedback would provide needed business intelligence on 
how the food tasted, how the service was delivered, and the overall quality of the experience. 


Define Success: The goal of this project is to build a computational linguistic model using 
Word2VEC that can take a text response (as identified in our hypothetical use case above) and 
output a sentiment classification that is meaningful. 


In this chapter, we introduce the foundational knowledge about deep learning for computational 
linguistics. 


We present the role of the dense vector representation of words in various computational 
linguistic tasks, and how to construct them from an unlabelled monolingual corpus. 


Then we'll present the role of language models in various computational linguistic tasks such as 
text classification, and how to construct them from an unlabelled monolingual corpus using 
Convolutional Neural Networks (CNN). We'll explore Convolutional Neural Networks 
architecture for language modeling. 


While working with machine learning/deep learning, the structure of data is very 

important. Unfortunately, raw data is often very unclean and unstructured, especially in the 
practice of Natural Language Processing (NLP). When working with textual data we cannot 
feed strings as input in most of deep learning algorithms, hence word embedding methods come 
into play for the rescue. Word embedding is used to transform the textual data into dense vectors 
(Tensors) form which we can feed it to the learning algorithm. 


There are several ways in which we can perform word embeddings such as One-hot encoding, 
GloVe, Word2vec and many more, each one of them has some pros and cons. Our current 
favorite is Word2VEC, because it has been proven to be the most efficient approach when it 
comes to learning the high-quality features. 


If you have ever worked before on a use-case in which the input data is in forms of text, then you 
know that is a really messy affair because you have to teach a computer about the irregularities 
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about the human language which has lots of ambiguities and you have to teach is sort of like 
hierarchical and the sparse nature of language grammar. So this is the kind of promises that word 
vectors solve by removing the ambiguities and make all different kinds of concepts similar. 


In this chapter, we will learn how to build word2vec models and analyze what characteristics we 


can learn about the provided corpus. Also, we will learn how to build a language model utilizing 
a CNN with trained word vectors. 


50 


Learning Word Vectors 


To implement a fully functional word embedding models we will be performing following steps: 


. Load all the dependencies 

. Prepare the Text Corpus 

. Define the model 

. Training the model 

. Analysing the model 

. Plotting the word cluster using the t-SNE algorithm 
. Plotting the model on Tensorboard 


NOU BWN FR 


Let's make some world-class word embedding models! 


Code Link :https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/blob/master/Chapter3/create_word2vec.ipynb 
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Load all the dependencies 


In this chapter, we will be using Gensim module (https://github.com/RaRe-Technologies/gensim) 
to train our word2vec model. Gensim provides large-scale multicore processing support to many 
popular algorithms like Latent Dirichlet Allocation (LDA), Hierarchical Dirichlet Process 
(HDP) and word2vec. There are other approaches that we could take like using Tensorflow 


(https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec_optimized.py) 
to define your own computation graph and build the model, and we will look into that later on. 


Know the Code! Python dependencies are quite manageable. You can learn more here 
https://packaging. python. org/tutorials/managing-dependencies/. 


"This tutorial walks you through the use of Pipenv to manage dependencies for an application. It 
will show you how to install and use the necessary tools and make strong recommendations on 
best practices. Keep in mind that Python is used for a great many different purposes, and 
precisely how you want to manage your dependencies may change based on how you decide to 
publish your software. The guidance presented here is most directly applicable to the 
development and deployment of network services (including web applications), but is also very 
well suited to managing development and testing environments for any kind of project." 


We will be using the seaborn package to plot the word clusters, sklearn to implement the t-SNE 
algorithm and Tensorflow for building tensorboard plots. 


import multiprocessing 

import os , json , requests 

import re 

import nltk 

import gensim.models.word2vec as w2v 

import sklearn.manifold 

import pandas as pd 

import seaborn as sns 

import tensorflow as tf 

from tensorflow.contrib.tensorboard.plugins import projector 
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Prepare the Text Corpus 


We will use previously trained NLTK tokenizer (http://www.nltk.org/index.html) and stopwords 
for English language to clean our corpus and extract relevant unique words from the corpus. We 
will also create a small module to clean the provided collection with the list of unprocessed 
sentences to output the list of words. 


"""™**Download NLTK tokenizer models (only the first time)**""" 


nltk.download("punkt") 
nltk.download("stopwords") 


def sentence_to_wordlist(raw): 
clean = re.sub("[4a-zA-Z]","_", raw) 
words = clean.split() 
return map(lambda x:x.lower(),words) 


Since we haven't yet captured the data from the text responses in our hypothetical business use 
case, let's collect a good quality dataset available on the web. Demonstrating our understanding 
and skills with this corpus will prepare us for the hypothetical business use case data. You can 
also use your own dataset, but it's important to have a huge amount of words so that word2vec 
model can generalize well. So we will load our data from Gutenberg.org website. 


Then we tokenize the raw corpus into the list of unique clean words as shown in the figure 
below. 
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Figure 3.1 : This process depicts the data transformation from raw data to the list of words which 
will be feed into the word2vec model. 


# Article Oon earth from Gutenberg website 
filepath = 'http://www.gutenberg.org/files/33224/33224-0.txt 


corpus_raw = requests.get(filepath).text 

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle' ) 
raw_sentences = tokenizer.tokenize(corpus_raw) 

#sentence where each word is tokenized 

sentences = [] 


for raw_sentence in raw_sentences: 
if len(raw_sentence) > 0: 
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sentences.append(sentence_to_wordlist(raw_sentence) ) 
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Defining Our Word2vec Model 


Now let's use gensim in our definition of our word2vec model. To begin, let's define a few hyper- 
parameters for our model like the dimension, which means how many low-level features we want 
to learn. Each dimension will learn a unique concept of gender, objects, age etc. 


Computational Linguistics Model Tip #1: Increasing the number of dimensions leads to better 
generalization ...but also it also adds more computational complexity. The right number is an 
empirical question for you to determine as an Applied AI Deep Learning Engineer! 


Computational Linguistics Model Tip #2: Pay attention to context_size . It's important 
because it sets the upper limit for the distance between the current and target word prediction 
within a sentence. This helps the model in learning deeper relationships of a word with the other 
nearby words. 


Using the gensim instance, we will define our model including all the hyper-parameters. 


num_features = 300 


# Minimum word count threshold. 
min_word_count = 3 


# Number of threads to run in parallel. 


#more workers, faster we train 
num_workers = multiprocessing.cpu_count() 


# Context window length. 
context_size = 7 


# Downsample setting for frequent words. © - 1e-5 is good for this 
downsampling = 1e-3 


seed = 1 


model2vec = w2v.Word2Vec( 
sg=1, 
seed=seed, 
workers=num_workers, 
size=num_features, 
min_count=min_word_count, 
window=context_size, 
sample=downsampling 


) 


model2vec.build_vocab(sentences) 
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Let's Train The Model 


Once we have configured the gensim word2vec object, we need to give the model some training. Be 
prepared, as this might take some time depending on the amount of data and the computation 
power you have. In this process we have to define the number of epochs we need to run, it can 
vary depending on your data size. You can play around with this values and evaluate you 
word2vec models performance. 


Also, we will save the trained model so that we can use it later on while building our language 
models: 

"""™**Start training, this might take a minute or two...**""" 

model2vec.train(sentences , total_examples=model2vec.corpus_count , epochs=100) 

""**Save to file, can be useful later**""" 


if not os.path.exists(os.path.join("trained", 'sample') ) 
os.makedirs(os.path.join("trained", 'sample')) 


model2vec.save(os.path.join("trained", 'sample', ".w2v")) 


Once the training process is completed you can see a binary file stored in /trained/sample.wav . 
You can share this file with other and load it later into any other NLP task. 
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Analysing The Model 


Now that we have trained our word2vec model, let's explore what our model was able to learn. 
We will use most_similar() to explore the relations between various words. In the following 
example, you see that the model was able to learn that the word earth is related to crust, globe, 
and other words. It is interesting to see that we have just provided the raw data and model was 
able to learn all this relations and concepts automatically! 


model2vec.most_similar("earth") 


"T(u'crust', ©.6946468353271484), 
(u'globe', @.6748907566070557), 
(u'inequalities', 0.6181437969207764), 
(u'planet', ©.6092090606689453), 
(u'orbit', ©.6079996824264526), 
(u'laboring', ©.6058655977249146), 
(u'sun', ©.5901342630386353), 
(u'reduce', 0.5893668532371521), 
(u'moon', ©.5724939107894897), 
(u'eccentricity', 0.5709577798843384) ]" 


Let's try to find words related to human we see what the model has learned. 


model2vec.most_similar ("human") 


"[(u'art', 0.6744576692581177), 

(u'race', ©.6348963975906372), 

(u'industry', 0.6203593611717224), 

(u'man', ©.6148483753204346), 

(u'population', 0.6090731620788574), 

(u'mummies', 0©.5895125865936279), 

(u'gods', 0.5859177112579346), 

(u'domesticated', 0.5857442021369934), 

(u'lives', ©.5848811864852905), 

(u'figures', 0.5809590816497803) ]" 
Critical thinking Tip: It's interesting to observe that ‘art’, 'race’, ‘industry’ are the most similar 
outputs. Remember that these similarities are based on the corpus of text that we used for 
training and should be thought of in that context. Generalization and it's unwanted sidekick Bias 
can come into play when similarities from outdated or dissimilar training corpus are used to train 


a model that is applied to a new set of language data or cultural norms. 


Even when we try to derive an analogy by using two positive vectors as earth and moon and a 
negative vector orbit, the model predicts the word sun which makes sense because there is a 
semantic relation between moon orbiting around earth and earth orbiting around the sun. 


model2vec.most_similar_cosmul(positive=['earth', 'moon'], negative=['orbit']) 


(u'sun', 0.8161555624008179) 


So, we learned that using word2vec model one can derive valuable information from the raw 
unlabeled data. This process is very crucial in terms of learning the language grammar and 
semantic correlations between words. 


Later we will learn how to use these word2vec features as an input for the classification model, 
which helps in boosting the model accuracy and performance. 
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Plotting The Word Cluster Using The t-SNE 
Algorithm 


So after our analysis, we know that our word2vec model has learned some concepts from the 
provided corpus but how do we visualize it. Because we have created a 300-dimensional space 
to learn the features, it's practically impossible for us to visualize. To make it possible we will 
use a dimension reduction algorithm called t-SNE which is very well known for reducing a high 
dimensional space into more humanly understandable 2 or 3-dimensional space. 


"t-Distributed Stochastic Neighbor Embedding (t-SNE) (https://lvdmaaten.github.io/tsne/) is a 
(prize-winning) technique for dimensionality reduction that is particularly well suited for the 
visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut 
approximations, allowing it to be applied on large real-world datasets. We applied it on data sets 
with up to 30 million examples." 

-- Laurens 
van der Maaten 


To implement this we will use sklearn package and define the n_components=2 which mean we 
want to have 2-dimensional space as the out. Next, we will perform the transformation by 
feeding the word vectors into the t-SNE object. 


After this step, we now have a set of value for each word which we can use as x-coordinate and 
y-coordinates respectively to plot it in the 2d plane. Let's prepare a dataframe to store all the 
words and its x, y coordinates in the same variable as shown in figure 3.2 and take data from 
there to create a scatter plot: 


tsne = sklearn.manifold.TSNE(n_components=2, random_state=0) 
all_word_vectors_matrix = model2vec.wv.vectors 


all_word_vectors_matrix_2d = tsne.fit_transform(all_word_vectors_matrix) 





points = pd.DataFrame( 


(word, coords[0], coords[1]) 

for word, coords in [ 
(word, all_word_vectors_matrix_2d[model2vec.wv.vocab[word] . index] ) 
for word in model2vec.wv.vocab 





] 


] 
columns=["word", myn myn] 


) 


sns.set_context ("poster") 
ax = points.plot.scatter("x", "y", s=10, figsize=(20, 12)) 
fig = ax.get_figure() 


This is our dataframe containing: words and coordinates for both x and y. 
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word x y 
0 writings -7.430655 -2.197504 
1 grossier -6.101781 6.286809 
2 yellow 0.267137 9.981658 
3 four 9.251252 1.031454 
4 woods 1.035477 13.307897 
5 preface -9.289152 0.782565 
6 woody -0.714374 -6.950973 
7 increase 4.004862 0.776931 
8 granting 1.398124 5.031035 
9 electricity 8.233945 -7.784630 


Figure 3.2 Word list with the coordinate values obtained using t-SNE 


This is what the entire cluster looks like after plotting 425,633 tokens in the 2d plane. Each 
point is positioned after learning the features and correlations between the nearby words: 
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Figure 3.3 Scatter plot of all the unique words in 2D plane. 
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Visualizing the Embedding Space - Plotting 
the model on Tensorboard 


But there is no benefit of visualization if you cannot make use of it in terms of understanding 
how and what the model has learned. To gain a better intuition on what the model has learned we 
will be using Tensorboard. 


Tensorboard is a powerful tool which can be used to build various kinds of plots to monitor your 
models while training process, build deep learning architectures and also word embeddings. Let's 
build a tensorboard embedding projection and make use of it to do various kinds of analysis. 


To build an embedding plot in Tensorboard we need to perform following steps: 


1. Collect the words and the respective tensors (300-D vectors) that we learned on previous 
steps. 

Create a variable in the graph which will hold the tensors. 

Initialize Projector 

Include an appropriately named embedding layer 

Store all the words with a .tsv formatted metadata file. These file types are used by 
tensorboard to load and display words. 

Link the .tsv metadata file to the projector object. 

Define a function which will store all the summary checkpoints 


SP aS 


moO 


This is the code to complete these 7 steps: 


vocab_list 
embeddings 


points.word.values.tolist() 
all_word_vectors_matrix 


embedding_var = tf.Variable(all_word_vectors_matrix, dtype='float32', name='embedding' ) 
projector_config = projector.ProjectorConfig() 


embedding = projector_config.embeddings.add() 
embedding.tensor_name = embedding_var.name 


LOG_DIR='./' 
metadata_file = os.path.join("sample.tsv") 


with open(os.path.join(LOG_DIR, metadata_file), 'wt') as metadata: 
metadata.writelines("%s\n" % w.encode('utf-8') for w in vocab_list) 


embedding.metadata_path = os.path.join(os.getcwd(), metadata_file) 


# Use the same LOG_DIR where you stored your checkpoint. 
summary_writer = tf.summary.Filewriter(LOG_DIR) 


# The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will 
# read this file during startup. 

projector.visualize_embeddings(summary_writer, projector_config) 

saver = tf.train.Saver([embedding_var ] ) 

with tf.Session() as sess: 


# Initialize the model 
sess.run(tf.global_variables_initializer()) 
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saver.save(sess, os.path.join(LOG_DIR, metadata_file+'.ckpt')) 
Once the tensorboard preparation module is executed, the binaries, metadata, and checkpoints get 


stored in the disk (as shown in the following figure): 


checkpoint projector config.pbtxt sample.tsv.ckpt.index url.txt 
datalab sample. tsv sample.tsv.ckpt.meta 
nltk data sample.tsv.ckpt.data-00000-of-00001 trained 


Figure 3.4 Outputs created by the Tensorboard 


To visualize the tensorboard, execute the following command in command prompt: 


tensorboard --logdir=/path/of/the/checkpoint/ 


Now in browser open http://localhost :6006/#projector, this is tensorboard with all the data points 
projected in 3d space. You can zoom in, zoom out, look for the specific word and also re-train 
the model using t-SNE and visualize the cluster formation of the dataset: 
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Figure 3.5 Tensorboard Embedding projection 

Data Visualization helps you tell your story! TensorBoard is very cool! Your business use case 
stakeholders love impressive dynamic data visualizations. They help with your model intuition 
and generating new hypotheses to test. 
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Building language model using CNN + 
word2vec 


Now that we have learned core computational linguistics concepts and trained relations from the 
provided dataset, we can use this learning to implement a language model which can perform a 
task. 


In this section, we will build a text classification model to perform sentiment analysis. For 
classification, we will be using a combination of CNN and a pre-trained word2vec model which we 
learned in the previous section of this chapter. 


This task is the simulation of our hypothetical business use case of taking text responses from 
restaurant patrons and classifying what they text back into meaningful classes for the restaurant. 


We are inspired by Denny Britz's (https://twitter.com/dennybritz) work on Implementing a CNN 
for Text Classification in TensorFlow (http://(www.wildml.com/2015/12/implementing-a-cnn-for- 
text-classification-in-tensorflow/) in our own CNN and text classification build. We invite you 
to review the blog he created to gain a more complete understanding of the internal mechanisms 
that make CNN's useful for text classification. 


As an overview, this architecture starts with an input embedding step, then a 2D convolution 
utilizing max pooling with multiple filters and softmax activation layer producing the output. 
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Exploring The CNN Model 


You might be asking yourself, how do you use CNN’s to classify text when they are most 
commonly used in image processing? 


There are many discussions in the literature such as [1],[2],[3] which has proven that CNN's are a 
generic feature extraction function which can compute Location Invariance and 
Compositionality. Location Invariance property helps the model to capture the context of the 
words irrespective of its occurrence in the corpus and Compositionality helps to derive higher- 
level representation using lower-level features. 


[1]: Convolutional Neural Networks for Sentence Classification 


[2]: A Cnn Based Scene Chinese Text Recognition Algorithm with Synthetic Data Engine 
[3]: Text-attentional Convolutional Neural Networks for Scene Text Detection 


So instead of sending pixel values for an image into the model, we feed one-hot encoded word 
vectors or the word2vec matrix which represents a word or a character (for character based 
models). The implementation by Denny Britz, has two filters each in three region sizes of two, 
three and four. The convolution operation is performed by these filters as it processes over the 
sentence matrix to generate feature maps. Downsampling is performed by a max pooling 
operation over each activation map. Finally, all the outputs are concatenated and passed into the 
softmax classifier. 


Because we are performing sentiment analysis, there will be both a positive and a negative 
output class target. The softmax classifier will output probabilities for each class: 
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Figure 3.6: Taken from Denny's blog post. Describing the functioning of the CNN language 


Let’s look into the implementation of the model. We have modified the existing implementation 
adding the input of the previously trained Word2Vec model component. 


Code Link: https://github.com/PacktPublishing/Python-Deep-Learning- 
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Projects/tree/master/Chapter3/sentiment_analysis 


The model resides in text_cnn.py. We created a class named textcnn which takes few parameters as 
an input for the model's configuration, they are also known as hyperparameter. 


sequence_length: The fixed sentence length. 

num_classes: How many output classes will be produced by the softmax activation (positive 
and negative). 

vocab_size: The count of unique words in our embeddings. 

embedding_size: Embedding dimensionality that we created. 

filter_sizes: The convolutional filter will cover this many words. 

num_filters: Each filter size will have this many filters. 

pre_trained: Integrates the word2vec representation that has been previously trained . 


import tensorflow as tf 
import numpy as np 


(on 3-0 


(self, 
sequence_length, 
num_classes, 
vocab_size, 
embedding_size, 
filter_sizes, 
num_filters, 
l2_reg_lLambda=0.0, 
pre_trained=False): 





The code is divided into 6 main parts: 


1. 


Placeholders for Inputs: All the placeholders that we need to contain the input values for 
our model are defined first. In this case, inputs are the sentence vector and associated 
labels (either positive or negative). input_x holds the sentence, input_y hold the value of 
label and we use dropout_keep_prob for the probability that we keep a neuron in the dropout 
layer. As seen in the following screenshot: 
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# Placeholders for input. output 
self.input_x = tf.placeholder( 
Chounts2 al 
None, 
sequence_Length, 
J], name= ) 


self.input_y = tf.placeholder( 


tf.float32, [None, num_classes], name= 
self.dropout_keep_prob = tf.placeholder( 
tf.float32, name= 


# Keeping track of 12 regularization 


12_loss = tf.constant(0.0) 





2. Embedding: Our model's first layer, in which we feed the word representations that was 
learned in the process of training the word2vec is the embedding layer. We will modify the 
baseline code that's in the repository to use our pre-trained embedding model instead of 
learning the embedding from scratch. This will enhance the model accuracy. It is also a 
kind of a transfer learning where we transferring the general knowledge learned from a 
generic Wikipedia or social media corpus. The embedding matrix that is initialized with 
the word2vec model is named W: 


§ Embedding Laye 
with tf.device( ), tf.name_scope( 
if pre_trained: 
WwW. = tf.Variable( 
tf.constant(0.0, shape=[vocab_size, embedding_size]), 
trainable=False, 
name= ) 
self .embedding_placeholder = tf.placeholder( 
tf.float32, [vocab_size, embedding size], 
name= ) 
tf.assign(W_, self.embedding_placeholder ) 


tf. Variable( 
tf.random_uniform( [vocab_size, embedding_size], -1.0, 1.0), 
name= ) 


self .embedded_chars = tf.nn.embedding_lookup(W, self.input_x) 
self .embedded_chars_expanded = tf.expand_dims( 
self.embedded_chars, -1) 





3. Convolution with Maxpooling: Defining the convolution layer is done 
with tf.nn.convza() . This takes as inputs the previous embedding layer's weight (W - filter 
matrix) and applied a nonlinear ReLu activation function. Further max polling is 
performed over each filter size using tf.nn.max_pool(). Results are concatenated creating a 
single vector that will become the inputs for the following layer of the model. 
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Create a convolution 


pooled_outputs = [] 


for i, filter_size in (filter sizes): 
with tf.name_scope( % filter size): 


#%# Convolution Lays 
filter_shape = [filter_size, embedding_size, 1, num_filters] 
W = tf.Variable( 
tf.truncated_normal(filter_shape, stddev=0.1), name= 
= tf.Variable( 
tf.constant(0.1, shape=[num_filters]), name= 
conv = tf.nn.conv2d( 
self.embedded_chars_expanded, 
W, 
strides=[1, 1, 1, 
ere lelemmaler , 
name= ) 
Apply nonlinear y 
h = tf.nn.relu(tf.nn.bias_add(conv, b), name= 
Maxpooling over the output: 
pooled = tf.nn.max_pool( 
h, 
ksize=[1, sequence Length filter_size +1, 1, 1] 
strides=[1, 1, 1, 1], 
padding= ; 


name= ) 


eleleRGcitMmeltha slth@-wer-|e)+l-lalel @elerenmcre| 


Combine all the pooled features 
num filters total = num_filters * (filter_sizes) 
self.h_pool = tf.concat(pooled outputs, 3) 
self. h_pool_flat tf.reshape(self.h_pool, [-1, num_filters_total]) 





4. Dropout layer: To regularize CNN and prevent the model from overfitting, a minor 


percent of signals from neurons are blocked. This prevents forces the model to learn more 
unique or individual features. 


t Add folmejelelen 
with tf.name_scope( NE 


self.h_drop = tf.nn.dropout(self.h_pool_flat, 
self .dropout_keep_prob) 





5. Prediction: A tensorflow wrapper performs the wx+b matric multiplications, where x is the 
output of the previous layer. This computation will compute the values for the scores and 
the predictions will be produced by tf.argmax() . 
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tf Fina funnor i | 
with tf.name_scope( 
= tf.get_variable( 


b 
shape=[num_filters_total, num_classes], 
initializer=tf.contrib. Layers.xavier_initializer( )) 


b = tf.Variable(tf.constant(@.1, shape=[num_classes]), name= 
l\2_loss += tf.nn.12_loss(W) 

l2_loss += tf.nn.12_loss(b) 

self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name= 
self.predictions = tf.argmax(self.scores, 1, name= 





6. Accuracy: We can define the loss function with our scores. Remember, that the 
measurement of the error our network makes is called Loss. As good deep learning 
engineers, we want to minimize it and make our model more accurate. For problem of 


categorization the cross-entropy loss (http://cs231n.github.io/linear-classify/#softmax) is 
the standard loss function used. 


# CalculateMean ros 
with tf.name_scope( 
losses = tf.nn.softmax_cross_entropy_with_Logits( 
labels=self.input_y, logits=self.scores) 
self.loss = tf.reduce_mean( losses) + 12_reg_lambda * 12 loss 


* Accuracy 


with tf.name_scope( ): 
correct_predictions = tf.equal(self.predictions, 


tf.argmax(self.input_y, 1)) 
self.accuracy = tf.reduce_mean( 


tf.cast(correct_predictions, ), name= 





That’s it, we’re done with our model. Let's use TensorBoard to visualize the network and 
improve our intuition: 
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Figure 3.7: The CNN model architecture definition. 
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Understanding Data Format 


An interesting dataset : Movie Review data from Rotten Tomatoes 
(http://www.cs.comell.edu/people/pabo/movie-review-data/) was used in this case. Half of the 
reviews are positive, the other half negative and there are about 10,000 review sentences in total. 
There are around 20,000 different words in the vocabulary. The dataset is stored in the data 
folder. 


It contains 2 files: one containing all negative sentences - rt-polarity.neg and another containing 
only positive sentences - rt-polarity.pos. To perform classification we need to associate them 
with the labels. Each positive sentence is associated with a one-hot encoded label as ,o, 1] and 
each negative sentence is associated with [1, 0] as shown in the following figure: 


the rock is destined to be the 21st century's new" conan" and that h... — [0, 1] 
the gorgeously elaborate continuation of " the lord of the rings " tri... res} (0, 1] 
effective but too-tepid biopic mn) (0, 1] 
if you sometimes like to go to the movies to have fun , wasabi is a go ... mp [0, 1] 
emerges as something rare , an issue movie that's so honest and keenly ... (0, 1] 
the film provides some great insight into the neurotic mindset of all... [0, 1] 
offers that rare combination of entertainment and education . (0, 1 
perhaps no picture ever made has more literally showed that the road t... 0, 1] 
steers turns in a snappy screenplay that curls at the edges ; it's so... 0, 1] 
take care of my cat offers a refreshingly different slice of asian cin... (0, 1] 
this is a film well worth seeing , talking and singing heads and all. ... {@, 1] 
what really surprises about wisegirls is its low-key quality and genui ... {0, 1] 
( wendigo is ) why we go to the cinema : to be fed through the eye , ... = {0, 1] 
one of the greatest family-oriented , fantasy-adventure movies ever. ... = (0, 1] 
ultimately , it ponders the reasons we need stories so much . = (0, 1] 
an utterly compelling 'who wrote it' in which the reputation of the mo... aap {0, 1] 
Training data Training label 


Figure 3.8 : A sample of few positive sentences and the label associated with the sentence. 


Pre-processing the text data is done in the next four steps: 
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. Load: both positive and negative sentence data files. 

. Clean: use regex to remove punctuations and other special characters. 

. Pad: Make each sentence the same size by appending <PAD> tokens. 

. Index: Map each word to an integer in an index so that each sentence can become a vector 
of integers. 


RWNPR 


Now that we have our data formatted as vectors we can feed them into our model. 
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Integrating word2vec with CNN 


So, last time when we created our word2vec model, we dumped that model into a binary file. 
Now its time to use that model as part of our CNN model. We perform this by initializing the 
weights W in the embeddings to these values. 


Since we trained on a very small corpus in our previous word2vec model, let's choose the 
Word2Vec model that was pre-trained on the huge corpus. A good strategy is to use FastText 
embedding which is trained on online available documents and for 294 languages 


(https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md). 


e We will download the English Embedding (https://s3-us-west-1.amazonaws.com/fasttext- 
vectors/wiki.en.zip) . 

e Extract the vocab and embedding vectors into the separate file. 

e Load them into the train. py file. 


That's it, by introducing this step we can now feed the embedding layer with the pretraining 


word2vec model. This incorporation of information has a sufficient amount of features to 
improve the learning process of the CNN model. 
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Executing the Model 


Now its time to train our model with the provided dataset and the pre-trained embedding model. 
A few hyperparameters will need fine-tuning to achieve good results. But once we have executed 
the train.py file with reasonably good configurations we can demonstrate that the model is able to 
classify well between the positive and negative sentences. 


As we can see in the image the performance metric of accuracy is tending towards 1 and the loss 
factor is reducing towards 0 over each iteration. 


‘accuracy_1 


18 Bi SO ere ete) Cou Be We | 





Figure 3.9 Plot of the performance metrics accuracy and loss of the CNN model during training 
process. 


Voila! We just used the pre-trained embedding model to train our CNN classifier with the 
average loss of 6.9 and accuracy of 72.6%. 


Once the model training is completed successfully, the output of the model will have : 
e The checkpoints stored in /runs/ folder. We will use this checkpoints to make predictions. 


e A summary with all the loss, accuracy, histogram and gradient value distribution captured 
during the training process. One can visualize it using the tensorboard. 
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Deploy the model into production 
Now that we have our model binaries stored in /runs/ folder, we just need to write a restful API 


for which you Can use flask and call the sentiment_engine( ) defined in the model_inference.py code. 


Always make sure that you use the checkpoints of the best model and the correct embedding file, 
which is defined as : 


checkpoint_dir = "./runs/1508847544/" 
embedding = np.load('fasttext_embedding.npy' ) 
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Summary 


Today's project was to build a deep learning computational linguistics model using Word2VEC 
to accurately classify text in a sentiment analysis paradigm. Our hypothetical use case was to 
apply deep learning to enable the management of a restaurant chain to understand the general 
sentiment of text responses their customers made in response to a phone text question asking 
about their experience after dining. Our specific task was to build the natural language 
processing model that would build the business intelligence from the data obtained in this simple 
(hypothetical) application. 


Revisit Success Criteria: How did we do? Did we succeed? What is the impact of success? 
Just as we defined success at the beginning of the project, these are the key questions we ask as 
Deep Learning Data Scientists as we look to wrap up a project. 

Our Convolutional Neural Network (CNN) model that was built on the trained woraavec created 
earlier in the chapter reached an accuracy of 72.6%! This means that we were able to accurately 
classify the unstructured text sentences as positive or negative. 


What are the implications of this accuracy? In our hypothetical example, this means that we can 
take a body of data that is difficult to summarize outside of this deep learning NLP model and 
summarize it for actionable insights by the restaurant management. With summary data points of 
positive or negative sentiment to the questions asked in a phone text, the restaurant chain can 
track performance over time and make adjustments and possibly even reward the staff for 
improvements. 


In the project in this chapter, we learned how to build word2vec models and analyze what 
characteristics we can learn about the provided corpus. We also learned how to build a language 
model using CNN using the trained word embeddings. 


Finally, we looked at the model performance in testing and determined whether we succeeded in 
achieving our goals. In the next chapter's project we're going to leverage even more power from 
our computational linguistic skills to create a natural language pipeline that would power a 
chatbot for open domain question answering. This is exciting work, let's see what next! 
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Build NLP pipeline for building chatbots 


Our project has expanded once again based on the good work that we've been doing. We started 
off working for a restaurant chain to help them classify handwritten digits for a text notification 
system to alert their waiting guests that their table was ready. Based on this success and when 
the owners realized that their customers were actually responding to the texts, we were asked to 
contribute a deep learning solution using natural language processing to accurately classify text 
into a meaningful sentiment category that would give the owners an indication as to the 
satisfaction with the dining experience. 


Do you know what happens to Deep Learning Engineers that do good work? They get asked to 
do more! :) 


This project for next business use case is pretty cool. What we’re being asked to do is to create a 
natural language pipeline that would power a chatbot for open domain question answering. The 
(hypothetical) restaurant chain has a website with their menu, history, location, hours, and other 
information and they would like the added ability for a website visitor to ask a question in a 
query box and our deep learning NLP chatbot to find the relevant information and present that 
back. They think that getting the right information back to the website visitor quickly would 
help drive in-store visits and improve the general customer experience. 


Named Entity Recognition (NER) will be the approach we’|I be using and it’ll give us the 
power we need to quickly classify the input text that we can then match to the relevant content 
for a response. It’s a great way to take advantage of a large corpus of unstructured data that 
changes without using hard-coded heuristics. 


In this chapter, we will learn about the building blocks of the NLP model like pre- 

process, tokenize, and tagging parts of speech. We use this understanding to build a system able 
to read an unstructured text, to comprehend an answer for a specific question. We also describe 
how to include this deep learning component in a classic NLP pipeline to retreive information, to 
provide an open-domain question answering system not requiring a structured knowledge base. 


We will: 


1. Build a basic FAQ based chatbot using statistical modeling in a framework capable of 
detecting intents and entities for Open-domain question-answering. 

2. Learn to generate dense representation of sentences. 

3. Build a Document Reader for extracting answers from unstructured text. 

4. Learn how to integrate deep learning models into a classic NLP pipeline. 


Define the Goal: 

Build a chat bot which understands the context (intent) and can also extract the entities, we need 
an NLP pipeline which can perform intent classification along with NER extraction to then 
provide an accurate response. 
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Skills learned will be: 
You will learn how to build an open-domain question answering system using a classic NLP 


pipeline, with a document reader component using deep learning techniques to generate sentence 
representations. 


Let's get started! 
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Basics of NLP pipeline 


Textual data is a very large source of information and properly dealing with it is crucial to 
success. So to handle these text data we need some basic text processing steps. 


Most of the processing steps covered in this section are commonly used in NLP and involve 
combining a number of steps into one executable flow. This is what we refer to as the NLP 
pipeline. 


This flow can be a combination of tokenization, stemming, word frequency, parts of speech 
tagging, and many more. 


Let's look into the details on how to implement the steps in the NLP pipeline and specifically 
what each processing does. We will use the Natural Language Toolkit (NLTK) package— an 
NLP toolkit written in Python. 


Code Link: https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/blob/master/Chapter4/BasicNLPPipeline.ipynb 


import nltk 
nltk.download('punkt' ) 
nltk.download('averaged_perceptron_tagger' ) 
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Tokenisation 


Tokenisation separates a corpus into sentences or words or tokens. Tokenization is needed to 
make our texts ready for further processing and is the first step in creating an NLP Pipeline. 
A token can vary according to the task we are performing or the domain in which we are 
working, so keep an open mind as to what you consider as a token! 


Know the Code: NLTK is powerful as much of the hard coding work is already done in the 
library. You can read more about NLTK Tokenisation here 


http://www.nitk.org/api/nitk.tokenize.html#nItk.tokenize.api.Tokenizerl.tokenize_sents. 


Let's try to load a corpus and use NLTK tokenizer to first tokenize the raw corpus into sentences 
and then each sentence further into words: 


text = u" ue 

Dealing with textual data is very crucial so to handle these text data we need some 
basic text processing steps. Most of the processing steps covered in this section are 
commonly used in NLP and involve the combination of several steps into a single 
executable flow. This is usually referred to as the NLP pipeline. These flow 

can be a combination of tokenization, stemming, word frequency, parts of 

speech tagging, etc. 


# Sentence Tokenization 
sentenses = nltk.sent_tokenize(text) 


# Word Tokenization 
words = [nltk.word_tokenize(s) for s in sentenses] 


OUTPUT: 

SENTENCES: 

[u'\nDealing with textual data is very crucial so to handle these text data we need some \nbasic text 
processing steps.', 

u'Most of the processing steps covered in this section are \ncommonly used in NLP and involve the 
combination of several steps into a single \nexecutable flow.', 

u'This is usually referred to as the NLP pipeline.', 

u'These flow \ncan be a combination of tokenization, stemming, word frequency, parts of \nspeech 
tagging, etc.'] 


WORDS: 

[[u'Dealing', u'with', u'textual', u'data', u'is', u'very', u'crucial', u'so', u'to', u'handle', 
u'these', u'text', u'data', u'we', u'need', u'some', u'basic', u'text', u'processing', u'steps', u'.'], 
[u'Most', u'of', u'the', u'processing', u'steps', u'covered', u'in', u'this', u'section', u'are', 
u'commonly', u'used', u'in', u'NLP', u'and', u'involve', u'the', u'combination', u'of', u'several', 
u'steps', u'into', u'a', u'single', u'executable', u'flow', u'.'], [u'This', u'is', u'usually', 
u'referred', u'to', u'as', u'the', u'NLP', u'pipeline', u'.'], [u'These', u'flow', u'can', u'be', u'a', 
u'combination', u'of', u'tokenization', u',', u'stemming', u',', u'word', u'frequency', u',', u'parts', 
u'of', u'speech', u'tagging', u',', u'etc', u'.']] 
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Part of speech tagging 


Some words have multiple meanings for example, charge is a noun and charge can also be a 
verb. Knowing a part of speech can help to disambiguate the meaning. Each token in a sentence 
has several attributes we can use for our analysis. The part of speech of a word is one example: 
nouns are a person, place, or thing; verbs are actions or occurrences; adjectives are words that 
describe nouns. Using these attributes, it’s straightforward to create a summary of a piece of text 
by counting the most common nouns, verbs, and adjectives: 


tagged_wt = [nltk.pos_tag(w)for w in words] 


('One', 'CD'), ('way', 'NN'), ('to', 'TO'), ('extract', 'VB'), ('meaning', 'VBG'), ('from', 'IN'), 
text', 'NN'), ('is', 'VBZ'), ('to', 'TO'), ('analyze', 'VB'), ('individual', 'JJ'), ('words', 'NNS'), 
-', '.")], [('The', 'DT'), ('processes', 'NNS'), ('of', 'IN'), ('breaking', 'VBG'), ('up', 'RP'), 

a', 'DT'), ('text', 'NN'), ('into', 'IN'), ('words', 'NNS'), (‘is', 'VBZ'), ('called', 'VBN'), 
"tokenization', 'NN'), ('--', ':'), ('the', 'DT'), ('resulting', 'JJ'), ('words', 'NNS'), (‘are', 
'VBP'), ('referred', 'VBN'), ('to', 'TO'), ('as', 'IN'), ('tokens', 'NNS'), ('.', '."')], 
[('Punctuation', 'NN'), ('marks', 'NNS'), ('are', 'VBP'), ('also', 'RB'), ('tokens', 'NNS'), ('.', 
'.')], [('Each', 'DT'), ('token', 'NN'), ('in', 'IN'), ('a', 'DT'), ('sentence', 'NN'), ('has', 'VBZ'), 
('several', 'JJ'), ('attributes', 'IN'), ('we', 'PRP'), ('can', 'MD'), ('use', 'VB'), ('for', 'IN'), 
(‘analysis', 'NN'), ('.', '.')]] 


patternPOos= [] 
for tag in tagged_wt: 
patternPOS.append([v for k,v in tag]) 


[['cD', 'NN', 'TO', 'VB', 'vBG', 'IN', 'NN', 'VBz', 'TO', 'VB', 'JJ', 'NNS', '.'], ['DT', 'NNS', 'IN', 
"VBG', 'RP', 'DT', 'NN', 'IN', 'NNS', 'VBZ', 'VBN', 'NN', ':', 'DT', 'JJ', 'NNS', 'VBP', 'VBN', 'TO!', 
‘IN', 'NNS', '.'], ['NN', 'NNS', 'VBP', 'RB', 'NNS', '.'], ['DT', "NN', 'IN', 'DT', 'NN', 'VBz', 'JJ', 
‘IN', 'PRP', 'MD', "VB', 'IN', 'NN', '.'], ['DT', 'NN', "IN', 'NN', 'IN', 'DT', 'NN', 'VBZ', 'CD', 'NN', 
tats "NNS', "'VBP', 'DT', "NN', Ye ty "NN', ee 'cc', "NN', tole "NNS', "VBP ', "NNS', "CG", "NNS', Vere 
"NNS', 'VBP', 'NNS', 'IN', 'NN', 'NNS', '.'], ['VBG', 'DT', 'NNS', ',', 'PRP', 'vBz', 'JJ', 'TO', 'VB', 
'DT', ee 'DT', 'NN', 'IN', 'NN', 'IN', 'VBG', 'DT', 'RBS', 'JJ', "NNS', ',', 'NNS', ',', 'CC!, 
NNS', '. 
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Extracting Nouns 


Let's extract all the nouns which are present in the corpus. It's very useful practice when you 


want to extract something specific. We are using NN, NNS, NNP, NNPS tags to extract the 
nouns. 


nouns = [] 
for tag in tagged_wt: 
nouns.append([k for k,v in tag if v in ['NN', 'NNS', 'NNP', 'NNPS']]) 


[['way', 'text', 'words'], ['processes', 'text', 'words', 'tokenization', 'words', 'tokens'], 
"Punctuation', 'marks', 'tokens' "token', 'sentence', '‘analysis' ‘part', 'speech', ‘word' 
' Ul 
t t t - ' ' a te t = . 
'example', 'nouns', 'person', 'place', 'thing', 'verbs', ‘actions', 'occurences', ‘adjectives', 'words', 
‘describe', 'nouns' ‘attributes', 'summary', 'piece', 'text', 'nouns', 'verbs', ‘adjectives' 
¥ Le ¥ t t £ 
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Extracting Verbs 


Let's extract all the verbs which are present in the corpus. In this case, we are using verb tags as 
VB, VBD, VBG, VBN, VBP, VBZ. 


verbs = [] 
for tag in tagged_wt: 
verbs.append([k for k,v in tag if v in ['VB','VBD', 'VBG', 'VBN', 'VBP', 'VBZ']]) 


[['extract', 'meaning', 'is', 'analyze'], ['breaking', 'is', 'called', ‘are', 'referred'], ['are'], 
['has', 'use'], ['is', ‘are', ‘are', 'are'], ['Using', "'s", 'create', 'counting']] 


Now, let's use spacy, to tokenize a piece of text and access the part of speech attribute for each 
token. As an example application, we’ll tokenize the previous paragraph and count the most 
common nouns with the following code. We’ll also lemmatize the tokens, which gives the root 
form a word to help us standardize across forms of a word: 


! pip install -q spacy 
! pip install -q tabulate 
! python -m spacy download en_core_web_lg 


from collections import Counter 
import spacy 

from tabulate import tabulate 

nlp = spacy.load('en_core_web_lg') 


doc = nlp(text) 
noun_counter = Counter(token.lemma_ for token in doc if token.pos_ == 'NOUN') 


print (tabulate(noun_counter.most_common(5), headers=['Noun', 'Count'])) 


step 3 
combination 2 
text 2 
processing 2 
datum 2 
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Dependency Parsing 


Dependency parsing is a way to understand the relationships between words in a sentence. 
Dependency relations are more fine-grained attributes available to help build the model's 
understanding of the words through their relationships in a sentence. 


doc = nlp(sentenses[2]) 
spacy.displacy.render(doc,style='dep', options={'distance' : 140}, jupyter=True) 


These relationships between words can get complicated, depending on how sentences are 
structured. The result of dependency parsing a sentence is a tree data structure, with the verb as 
the root as shown in the figure below. 







nsubjpass pobj 


prep det 


auxpass 


prep compouny 


This is usually referred to as the NLP pipeline. 


DET VERB ADV VERB ADP ADP DET PROPN NOUN 
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Named Entity Recognition 


Finally, there’s named entity recognition. Named Entities are the proper nouns of sentences. 
Computers have gotten pretty good at figuring out if they’re in a sentence and also classifying 
what type of entity they are. spaCy handles Named Entity Recognition at the document level, 
since the name of an entity can span several tokens: 

doc = nlp(u"My name is Jack and I live in India.") 


entity_types = ((ent.text, ent.label_) for ent in doc.ents) 
print(tabulate(entity_types, headers=['Entity', 'Entity Type'])) 


Output: 

Entity Entity Type 
Jack PERSON 
India GPE 


So we just saw some of the basic building blocks of NLP pipeline. These pipelines are 
consistently used in various NLP projects be it in machine learning or in the deep learning 
space. 


Something Look Familiar? 

We used few of these NLP pipeline building blocks in the previous chapter to build our wordavec 
models. This more in-depth explanation of the building blocks of the NLP pipeline helps us take 
the next step in our projects as we look to deploy more and more complex models! 

As with everything in this deep learning projects in Python book, we encourage you to also try 


your own combinations of the above processes for the use cases you work on in your data 
science career. Now let's implement a chatbot using these pipelines! 
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Building conversational bots 


In this section, we will learn about some basic statistical modeling approach to build an 
information retrieval systems using TF-IDF which we can use with the NLP pipelines to build 
fully functional chatbots. Also, later on, we will learn to build a much more advanced 
conversational bot which can extract a specific piece of information like location, capture time, 
etc using NER. 


87 


What is TF-IDF? 


Tf-idf's are a way to represent documents as feature vectors. But what are they? Tf-idf's can be 
understood as a modification of the raw term frequencies (tf) and inverse document frequency 
(iat). The tf is the count of how often a particular word occurs in a given document. The concept 
behind the tf-idf is to down weight terms proportionally to the number of documents in which 
they occur. Here, the idea is that terms that occur in many different documents are likely to be 
unimportant or don't contain any useful information for NLP tasks such as document 
classification. 


88 


Preparing dataset 


If we think about building a chatbot with TF-IDF approach we first need to form a data structure 
which supports train data with the label. Now let’s take an example of a chatbot which is built to 
answer questions from the users. In this case, using historical data we can form a dataset where 
we have two columns, one is the question and the second column is the answer to that question 
as shown in the following table: 


Answer 


When does your shop open? Our shop timings are 9:00 am - 9:00 pr 


What is today's special? Today we have a variety of Italian past 


What is the cost of an americano? [Americano with a single shot will cost 


Do you sell Ice-creams? We do have desserts like ice-cream, bri 





Let’s take the previous example and consider it as a sample dataset. It is a very small example 
and in the original hypothetical scenario, we will have a much larger dataset to work with. The 
typical process will be as follows: the user will interact with the bot and write a random query 
about the store. The bot will simply send that query to the NLP engine using API and then it is 
up to the NLP model to decide what to return for a new query (Test data). In reference to our 
dataset, all the questions are the train data and the answers are labels. In case of a new query, the 
TF-IDF algorithm will match it to one of the questions with a confidence score, which tells us 
that the new question asked by the user is close to some specific question from the dataset and 
the answer against that question is the answer that our bots return. 


Let’s take the above example even further. When the user queries: 

"Can I get an Americano, btw how much it will cost ?" 

We can see that words like ‘I’, ‘an’, 'it' are the ones that will have higher occurrence frequency in 
other questions as well. Now if we match our remaining important words, we will see that this 


question is most close to: "What is the cost of an americano?" 


So our bot will respond back with the historical answer to this type of question: 
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“Americano with a single shot will cost 1.4$ and the double shot will cost 2.3$.” 
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Implementation 


After creating the data structure of tabular format mentioned above we will be calculating the 
predicted answer to a question every time a user queries our bot. We load all the question 
answers pair from the dataset. Let's load our CSV file using pandas and perform some pre- 
processing on the dataset: 


Code Link: https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/tree/master/Chapter4/tfidf_version 


import pandas as pd 


filepath = 'sample_data.csv' 
csv_reader=pd.read_csv(filepath) 


question_list = csv_reader[csv_reader.columns[0]].values.tolist() 
answers_list = csv_reader[csv_reader.columns[1]].values.tolist() 


query= 'Can I get an Americano, btw how much it will cost ?' 
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Creating Vectorizer 


Now let's initialize the Tf-idf vectorizer and define few parameters such as: 


@ min_df: When building the vocabulary ignore terms that have a document frequency strictly 
lower than the given threshold. 
ngram_range: Configuring our vectorizer to capture n-words at a time 
norm: Norm used to normalize term vectors using L1 or L2 norms 
encoding: To handle the Unicode characters. 


There are lot more other parameters which one can look into and configure and play. 


from sklearn.feature_extraction.text import TfidfVectorizer 
vectorizer = TfidfVectorizer(min_df=0, ngram_range=(2, 4), strip_accents='unicode',norm='12' , 
encoding='ISO-8859-1') 


Now we train the model on the questions. 


# We create an array for our train data set (questions) 
X_train = vectorizer.fit_transform(np.array([''.join(que) for que in question_list])) 


# Next step is to transform the query sent by user to bot (test data) 
X_query=vectorizer.transform(query) 
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Process Query 


To process the query, we find out its similarity with other questions, we do it by taking a dot 
product of training data matrix with the transpose of query data: 


XX_similarity=np.dot(X_train.todense(), X_query.transpose().todense() ) 


Now, we take out the similarity between the query and train data as a list: 


XX_sim_scores= np.array(XxX_similarity).flatten().tolist() 
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Rank Results 


We create a sorted dictionary of similarities for a query: 


dict_sim= dict(enumerate(XX_sim_scores) ) 


sorted_dict_sim = sorted(dict_sim.items(), key=operator.itemgetter(1), reverse =True) 


Finally, in the sorted dictionary, we check for the index of the most similar question and 
response with the value at that index for the answers column. If nothing is found we can return 
our default answer. 


if sorted_dict_sim[0][1]==0: 

print("Sorry I have no answer, please try asking again in a nicer way :)") 
elif sorted_dic_sim[0][1]>0: 

print answer_list [sorted_dic_sim[0][0]] 
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Advance chatbots using NER 


We just created a very basic chatbot which can understand the user's query and then respond to 
the customer accordingly. But it is not yet capable of understanding the context because it can 
not extract information like the product name or places or any other entities. 


To build a bot which understands the context (intent) and can also extract the entities, we need 
an NLP pipeline which can perform intent classification along with NER extraction and then 
provide an accurate response. 


Keep your eyes on the goal! This is the goal of our open-domain question answering bot. 


And to do that we will use the open source project called rasa Niu 


(https://github.com/RasaHQ/rasa_nlu). 


Rasa NLU is a Natural Language Understanding tool for understanding the text what is being 
said in short pieces of text. For example, taking a short message like: 


"I'm looking for an Italian restaurant in the center of town" 


Then system returns: 


intent: search_restaurant 
entities: 

- cuisine : Italian 

- location : center of town 


So by harnessing the power of RASA, we can build a chatbot which can do intent classification 
and NER extraction. 


Great, let's do it. 


Code Link: https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/tree/master/Chapter4/rasa_version 
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Installing Rasa 


Let's install the rasa in our local or server using this following commands: 


pip install rasa_nlu 
pip install coloredlogs sklearn_crfsuite spacy 
python -m spacy download en 


If it fails to install then you can look into detail approach at_https://nlu.rasa.com/installation.html. 


Rasa uses the variety of NLP pipelines like spaCy, sklearn or MITIE. You can use any one of 
them or also build your own custom pipelines which can include any deep models like CNN with 
word2vec which we create in the previous chapter, in our case we will be using spaCy with 
sklearn. 
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Preparing dataset 


As we created a dataset in our previous approach in a CSV file with 2 columns of question and 
answer pair, we need to do it again but in a different format. In this case, we need questions 
associated with its intent as shown in the following screenshot, so we have a query as ‘hello' and 
its intent is labeled as 'greet'. Similarly, we will label all the questions with its respective intents. 


Once we have all the forms of questions and intents ready, now we need to label the entities. In 


this case, as shown in the figure we have ‘location’ entity with value 'centre' and 'cuisine' entity 
with value as 'Mexican': 
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greet hello 


greet hi 


restaurant_search i'm looking for a place to eat 


restaurant_search i'm looking for a place in the north of town 








restaurant_search show me chinese restaurants restaurant_search show me a Mexican place in the centre 
affirm yes 
yep location centre 
cuisine mexican 
affirm yeah 
Entities 


restaurant_search show me a mexican place in the centre 


goodbye bye 








Intents Queries 





To feed data in rasa, we need to store this information in a specific JSON format which looks 
like: 


# intent_list : Only intent part 


[ 
"text" . "hey" 
» i 
"intent": "greet" 
}, 
{ 
"text": "hello", 
"intent": "greet" 
} 

] 

# entity_list : Intent with entities 
"text": "show me indian restaurants", 
"intent": "restaurant_search", 
"entities": [ 

{ 
"start": 8, 
"end": 15, 
"value": "indian", 
"entity": "cuisine" 
} 
] 
}, 
] 


The final version of the JSON should have this structure: 


"rasa_nlu_data": { 
"entity_examples": [entity_list], 
"intent_examples": [intent_list] 
} 
} 
To make it simple there is an online tool in which you can feed and annotate all the data and 


download the JSON version of it. You can run the editor locally by following the instructions 


from https://github.com/RasaHQ/rasa-nlu-trainer or simply use the online version of it 
from https://rasahq.github.io/rasa-nlu-trainer/. 
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Save this JSON file as restaurant. json in the current working directory. 
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Train the model 


Now we’re going to create a configuration file. This configuration file will define the pipeline 
that is to be used in the process of training and building of the model. 


Create a file called config_spacy.ym1 in your working directory which looks like this: 


language: "en" 
pipeline: "spacy_sklearn" 
fine_tune_spacy_ner: true 


Know the Code: SpaCy configuration customization is there for a reason. Other Data Scientists 
have found some utility in the ability to change values here and it's good practice to explore this 
as you get more familiar with this technology. There is a huge list of configuration which you 


can look into here https://nlu.rasa.com/config.html. 


This configuration states that we will be using English language models and the pipeline running 
in the backend will be spaCy with the combination of sklearn. Now to begin the training process 
execute the following command: 


python -m rasa_nlu.train \ 
--config config_spacy.yml \ 
--data restaurant.json \ 
--path projects 


This takes the configuration file and the training data file as input and --path parameter is the 
location where the trained model will get stored. 


Once the model training process is completed you’ lI see a new folder named 
aS projects/default/model_YYYYMMDD-HHMMss With the timestamp when training finished. The complete 
project structure will look as seen in the following screenshot: 


config_spacy.yml _ default fy Bl model_20180523-213216 >| crf_model.pkl 

__ projects > intent_classifier_sklearn.pkl 

) rasSa_engine.py metadata.json 
restaurant.json training_data.json 
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Deploy the model 


Now it's the moment when you are going to make your bot go live! While using Rasa you don't 
need to write any API services, it is all available in the package itself. So to expose the trained 
model as service you need to execute the following command which takes the path of the stored 
trained model: 


python -m rasa_nlu.server --path projects 


If everything goes fine then a RESTfull API will be exposed at port 5000 and you can see this 
log on the console screen. 


2018-05-23 21:34:23+0530 [-] Log opened. 
2018-05-23 21:34:23+0530 [-] Site starting on 5000 
2018-05-23 21:34:23+0530 [-] Starting factory <twisted.web.server.Site instance at 0x1062207e8> 


To access the API you can use the following command. We are querying the model asking a 
question as "I am looking for a Mexican food": 


curl -X POST localhost:5000/parse -d '{"q":"I am looking for Mexican food"}' | python -m json.tool 


Output: 
{ 


"entities": [ 


"confidence": 0.5348393725109971, 
"end": 24, 
"entity": "cuisine", 
"extractor": "ner_crf", 
"start": 17, 
"value": "mexican" 
} 
1, 
"intent": { 
"confidence": 0.7584285478135262, 
"name": "restaurant_search" 


t 
"intent_ranking": [ 
{ 
"confidence": 0.7584285478135262, 
"name": "restaurant_search" 


"confidence": 0.11009204166074991, 
"name": "goodbye" 


"confidence": 0.08219245368495268, 
"name": "affirm" 


"confidence": 0©.049286956840770876, 
"name": "greet" 
} 


1, 
"model": "model_20180523-213216", 


"project": "default", 
"text": "I am looking for Mexican food" 


} 


So here we can see that model has performed quite accurately with the intent classification and 
the entity extraction process. It is able to classify the intent as restaurant_search with 75.8% of 
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accuracy and it is also able to detect cuisine entity with value as mexican. 
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Serving chatbots 


Now that we have seen how to build chatbots using 2 methods i.e tr1pF and rasa niu. Let's expose 
both of them as an API. The architecture of this simple chatbot framework will look like this: 


TFIDF Engine 


User 
Interface 


RASA Engine 

















Algorithms 


Server setup 


Refer to the repository (https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/tree/master/Chapter4), for the API code and look into chatbot_api.py file. We have 
implemented a common API which can load both the versions of bot and you can now build a 
whole framework on top of this. 





To execute the serving of the apis, follow these steps: 


1. Enter the chapter directory using the following command: cd chapter\ 4/ 
2. This will expose the RASA module at localhost :5000. If you have not trained the rasa 
engine then please follow the StepS: python -m rasa_nlu.server --path ./rasa_version/projects 
3. In separate console execute. This will expose an API at localhost :8080 
python chatbot_api.py 
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Now your chatbot is ready to be access via API as: 


Call below api to execute TFIDF version: 
curl http://localhost :8080/version1?query=Can I get an Americano 
Call below api to execute RASA version: 


http://localhost :8080/version2?query=where is Indian cafe 
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Summary 


In this project, we were asked to create a natural language pipeline that would power a chatbot 
for open domain question answering. A (hypothetical) restaurant chain has much text-based data 
on their website with their menu, history, location, hours, and other information and they would 
like the added ability for a website visitor to ask a question in a query box and our deep learning 
NLP chatbot to find the relevant information and present that back. 


We got our started showing how we could build a simple FAQ chatbot that took in random 
queries and matched that up to predefined questions and output a response with a confidence 
score that indicated the similarity between the input question and the question in our database. 
But this was only a stepping stone to our real goal which was to create a chatbot that could 
capture the intent of the question and prepare an appropriate response. 


We explored a Named Entity Recognition (NER) approach to give us the added power we 
needed to quickly classify input text that we could then match to the relevant content for a 
response. This was determined to fit our goal of allowing for open domain question answering 
and to take advantage of a large corpus of unstructured data that changes without 

using hardcoded heuristics (as in our hypothetical restaurant example). 


We leamed to use the building blocks of the NLP model like pre-process, tokenize, and tagging 
parts of speech. We use this understanding to build a system able to read an unstructured text, to 
comprehend an answer to a specific question. Specifically, we gained these skills in this project: 


1. How to build a basic FAQ based chatbot using statistical modeling in a framework capable 
of detecting intents and entities for Open-domain question-answering. 

2. Learned to generate a dense representation of sentences. 

3. How to build a Document Reader for extracting answers from unstructured text. 

4. Learned how to integrate deep learning models into a classic NLP pipeline. 


These skills will come in very handy in your career as you see similar business use cases and as 


conversational user interfaces continue to gain in popularity. Well done and let's see what's in 
store for our next Deep Learning Projects in Python! 
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Sequence-to-sequence models for building 
chatbots 


We're learning a lot and doing some valuable work! In our hypothetical business use case 
evolution this chapter builds directly on the previous chapter where we created our Natual 
Language Processing Pipeline. The skills we learned so far in computational linguistics should 
give us the confidence to expand past the training examples in this book to tackle this next 
project. We're going to build a more advanced chatbot for our hypothetical restaurant chain to 
automate the process of fielding call in orders. 


This requirement would mean that we'd have to combine a number of technologies that we've 
learned so far. But for this project, we'll be interested in learning how to make a chatbot that is 
more contextually aware and robust that we could integrate into a larger system in this 
hypothetical. By demonstrating mastery on this training example, we'll have the confidence to 
execute this in a real situation. 


In the previous chapters, we learned about representational learning methods like word2vec and 
using them in combination with a type of deep learning algorithm called a convolutional neural 
network (CNN's). But there are few constraints while using CNN's to build language models 
such as: 


It will not be able to preserve the state information. 

The length of sentences needs to be of fixed size for both inputs and outputs. 
CNN's are sometimes unable to adequately handle complex context. 

RNN's do better at modeling information in sequence. 


So to overcome all these problems we have an alternative algorithm which is specially designed 
to handle input data which comes in forms of sequences (like the sequence of words, the 
sequence of characters). This class of algorithm is called Recurrent Neural networks (RNN). 


In this chapter, we will: 


Learn about the RNN and its various forms 

Implement a language model using RNN 

Learn about the Long-Short Term Memory (LSTM) Model 

Implement the LSTM language model and compare it to the RNN 

Implement an Encoder-Decoder RNN based on the LSTM unit for a simple sequence to 
sequence question answer task 


Define the Goal: 
Build a more robust chatbot with memory to provide more contextually correct responses to 
questions. 


Let's get started! 
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About RNNs? 


RNN is a neural network architecture which is designed for sequential data. It can take words or 
characters of the text and extract relevant features from them by using small windows that travel 
over the corpus. 


An RNN is a function that applies some non-linear transformation (known as the RNN cell or 
step) to every element (word/char) of a sequence. The output of an RNN layer is the output of 
the RNN cell applied on each element of the sequence. In the case of text, these are usually 
successive words or characters. 


NOTE: Each RNN cells hold an internal memory that summarizes the history of the sequence it 
has seen so far. 


, 
h 


l 
Unrolled RNN 





] Sequencial Structure i Recurrent Structure (Con be feature map or cell r any recurent structure) 


The idea was to introduce feedback structure for context modeling by using fixed weight 
feedback structures. It's like you are creating connections between the current feature map 
(intermediate outputs in CNN) and its previous self like viewing your old photographs and 
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learning from your own younger version. 


But there can be problems, like exploding and vanishing gradients made it a living nightmare to 
train such systems for complex time series problems. The other problem was that the recurrent 
structures were either exploring short-term temporal structures or the long-term structures (but 
not both simultaneously). To solve this, the RNN cell is replaced by a gated cell like the Gated 
recurrent unit (GRU) refer to http: Liwww.. wildml.com/2015/10/recurrent-neural-network-tutorial- 
l l on-and-theano/ or the Long-short term memory cell 
(LSTM) to learn more refer to htip: Heald github.io/posts/2015-08-Understanding-LSTMs/. 





We'll explore the LSTM architecture in detail later in the chapter. But let's gain some intuition 
on it's value that will help us achieve our goal first. 
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RNN Architectures 


In 1998, Sepp’s work (https://dl.acm.org/citation.cfm?id=355233) addressed these problems of 
vanishing and exploding gradients of Simple Recurrent Networks, by introducing gating and 
memory cells. We will mostly use the LSTM cell since it has proven better in most NLP tasks. 
Long Short-Term Memory (LSTM) is an RNN architecture which addresses the problem of 
training over long sequences and retaining memory. LSTMs solve the gradient problem by 
introducing a few more gates that control access to the cell state. You could refer to Colah’s blog 


post (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) which is a great place to 
understand the working of LSTMs. 


These small LSTM units of RNN can be combined in multiple forms to solve various kinds of 
use-cases. RNNs are very flexible in terms of combining the inputs and output patterns as: 


e Many to One: Use the complete input sequence and make a single prediction like 
sentiment models 

e One to Many: Transforming a single input to generate a sequence like extracting "day", 
"month","year" from a given date. 

e Many to Many: Also know as seq2seq, transforming the entire input sequence into 
another form of sequence like Q/A systems 


one to many many to one many to many 


m0 


—- i 


In this chapter, we will focus on Many to Many also known as sequence-to-sequence (seq2seq) 
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architecture to build a question-answer chatbot. Almost all RNN approaches to solving the 
seq2seq problem involve three major components: 


1. Encoders: Encoding the input sentences into some abstract representation. 
2. Hidden Layer: Manipulating this encoding. 
3. Decoders: Decoding it to our target sequence. 


Hello, Can | get a coffee? 


Encoder 





Network 


Encoded message 08 


Decoder 


Network 





Yes sure, would you like to have anything to eat? 


We will look into more details later in this chapter. Let's first implement some basic forms of 
RNN models. 
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Implementing basic RNN 


In this section, we will implement a language model using a basic RNN to perform sentiment 
classification. 


Code files for the model can found at https://github.com/PacktPublishing/Python-Deep- 
Learning-Projects/blob/master/Chapter%205/1.%20rnn.py. 
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Importing all the dependencies 


from utils import * 

import tensorflow as tf 

from sklearn.cross_validation import train_test_split 
import time 
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Preparing dataset 


We will use sklearn wrapper to load the dataset from a raw file and then will use a helper 
function separate_dataset() to clean the dataset and transform from its raw form to the separate list 
structure. 


#Helper function 
def separate_dataset(trainset, ratio=0.5): 
datastring = [] 
datatarget = [] 
for i in range(int(len(trainset.data)*ratio)): 
data_ = trainset.data[i].split('\n') 
data_ = list(filter(None, data_)) 
for n in range(len(data_)): 
data_[n] = clearstring(data_[n]) 
datastring += data_ 
for n in range(len(data_)): 
datatarget.append(trainset.target[i]) 
return datastring, datatarget 


Here, trainset is an object which stores all the text data and the sentiment label data: 


trainset = sklearn.datasets.load_files(container_path = './data', encoding = 'UTF-8') 
trainset.data, trainset.target = separate_dataset(trainset,1.0) 

print (trainset.target_names) 

print ('No of training data' , len(trainset.data) ) 

print ('No. of test data' , len(trainset.target)) 


Output: 

['negative', 'positive'] 
No of training data 10662 
No of test data 10662 


Now we will transform the labels into the one hot encoding. 


It's important to understand the dimensions of the one hot encoding vector. Since we have 10662 
separate sentences and 2 sentiments that is negative and positive so our onehot vector size will be 
of size [10662, 2]. 


We will be using a popular sklearn wrapper train_test_split() to randomly shuffle the data and 
divide the dataset into 2 parts which are training set and the test set. Further with another helper 
function build_dataset(), we will create the vocabulary using word count based approach. 


You can also try to feed any embedding model in this place to make the model more accurate. 


ONEHOT = np.zeros((len(trainset.data), len(trainset.target_names) ) ) 
ONEHOT[np.arange(len(trainset.data)),trainset.target] = 1.0 

train_X, test_X, train_Y, test_Y, train_onehot, test_onehot = train_test_split(trainset.data, 
trainset.target, 

ONEHOT, test_size = 0.2) 


concat = ' '.join(trainset.data).split() 

vocabulary_size = len(list(set(concat) ) ) 

data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size) 
print('vocab from size: %d'%(vocabulary_size) ) 

print('Most common words', count[4:10]) 

print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]]) 


OUTPUT: 
vocab from size: 20465 
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‘Most common words', [(u'the', 10129), (u'a', 7312), (u'and', 6199), (u'of', 6063), (u'to', 4233), 
(u'is', 3378)] 


‘Sample data': 
[4, 662, 9, 2543, 8, 22, 4, 3558, 18064, 98] --> 
[u'the', u'rock', u'is', u'destined', u'to', u'be', u'the', u'21st', u'centurys', u'new'] 


Few important things to remember while preparing the dataset for the RNN models. We need to 
add explicitly special tags in the vocabulary to keep track of the start of sentences, extra padding, 
end of the sentence and unknown words. Hence we have reserved following positions for special 
tags in our vocab dictionary: 


# Tag to mark the beginning of the sentence 
'6o' = ot position 

# Tag to add extra padding in the sentence 
'paD'= 18¢ pistion 

# Tag to mark the end of the sentence 
'E0S'= 2" position 

# Tag to mark the unknown word 

'UNK'= 3° position 
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Hyperparameter 


We will define some of the Hyperparameters for our model as follows: 


size_layer 128 

num_layers 2 

embedded_size = 128 

dimension_output = len(trainset.target_names) 
learning_rate = 1e-3 

maxlen = 50 

batch_size = 128 
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Defining Basic RNN cell model 


Now we will create the RNN model which takes a few input parameters such as: 


size_layer: The number of units in the RNN cell 

num_layers: Number of hidden layers 

embedded_size: The size of the embedding 

dict_size: The vocabulary size 

dimension_output: Number of classes we need to classify 
learning_rate: The learning rate of the optimization algorithm 


This RNN model consists of following parts: 


1. Two placeholders one to feed sequence data into the model and second placeholder for the 
output. 

A variable to store the embedding lookup from the dictionary. 

Then adding RNN layer with multiple basic RNN cells. 

Create weight and bias variables 

Compute logits 

Compute loss 

Add Adam optimizer 

Calculate prediction and accuracy. 


ONOuURWN 


This model is similar to what we created in the previous chapter for CNN's except the RNN cell 
part: 


class Model: 
def __init__(self, size_layer, num_layers, embedded_size, 
dict_size, dimension_output, learning_rate): 


def cells(reuse=False) : 
return tf.nn.rnn_cell.BasicRNNCell(size_layer, reuse=reuse) 


tf.placeholder(tf.int32, [None, None] ) 
tf.placeholder(tf.float32, [None, dimension_output] ) 


encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1)) 
encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self .X) 


rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]) 
outputs, _ = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded, dtype = tf.float32) 


Ww = tf.get_variable('w', shape=(size_layer, 
dimension_output), initializer=tf.orthogonal_initializer()) 
b = tf.get_variable('b', shape=(dimension_output),initializer=tf.zeros_initializer()) 


self.logits = tf.matmul(outputs[:, -1], W) + b 

self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = self.logits, labels 
= self.Y)) 

self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost) 





correct_pred = tf.equal(tf.argmax(self.logits, 1), tf.argmax(self.Y, 1)) 
self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32) ) 


This is what the complete model looks like after computation. Data flows from the variables 
which we created in step 1. Then it moves to the embedding layer defined in step 2, further we 
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have RNN layer, which performs the computation in 2 hidden layers of RNN cells. Later logits 
are computed by performing matrix multiplication of weight and the output from the RNN layer 


and addition of bias. Finally, the cost functions are defined using the softmax cross entropy 
function. 
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The image below represents the structure of the RNN layer. It has two basic RNN cells which are 


added to the hidden layers. 
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Training 


So now its time to train our model. We will initialize the TensorFlow graph and initiate the 
training. 


tf.reset_default_graph() 

sess = tf.InteractiveSession() 

model = Model(size_layer,num_layers, embedded_size, vocabulary_size+4, dimension_output, learning_rate) 
sess.run(tf.global_variables_initializer()) 


EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 5, 0, 0, 0 
while True: 
lasttime = time.time() 
if CURRENT_CHECKPOINT == EARLY_STOPPING: 
print('break epoch:%d\n'%( EPOCH) ) 
break 


train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0 
for i in range(0, (len(train_X) // batch_size) * batch_size, batch_size): 
batch_x = str_idx(train_X[i:it+tbatch_size], dictionary, maxlen) 
acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], 
feed_dict = {model.X : batch_x, model.Y : train_onehot[i:i+batch_size]}) 
train_loss += loss 
train_acc += acc 


for i in range(0, (len(test_X) // batch_size) * batch_size, batch_size): 
batch_x = str_idx(test_X[i:itbatch_size], dictionary, maxlen) 
acc, loss = sess.run([model.accuracy, model.cost], 
feed_dict = {model.X : batch_x, model.Y : train_onehot[i:i+batch_size]}) 
test_loss += loss 
test_acc += acc 


train_loss /= (len(train_X) // batch_size) 
train_acc /= (len(train_X) // batch_size) 
test_loss /= (len(test_X) // batch_size) 
test_acc /= (len(test_X) // batch_size) 


if test_acc > CURRENT_ACC: 
print('epoch: %d, pass acc: %f, Current acc: %f'%(EPOCH,CURRENT_ACC, test_acc)) 
CURRENT_ACC = test_acc 
CURRENT_CHECKPOINT = 0 
else: 
CURRENT_CHECKPOINT += 1 


print('time taken:', time.time()-lasttime) 

print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\n'% 
(EPOCH, train_loss, train_acc, test_loss, test_acc)) 

EPOCH += 1 


OUTPUT: 

epoch: ©, pass acc: ©.000000, current acc: 0.512695 

time taken: 1.4734694957733154 

epoch: 0, training loss: 0.704275, training acc: 0.537405, valid loss: 0.716516, valid acc: 0.512695 


time taken: 1.3296072483062744 
epoch: 1, training loss: 0.627786, training acc: 0.651278, valid loss: 0.814184, valid acc: 0.506836 


time taken: 1.3306574821472168 
epoch: 2, training loss: 0.518082, training acc: 0.750000, valid loss: 0.951266, valid acc: 0.510742 


time taken: 1.330101728439331 
epoch: 3, training loss: 0.379654, training acc: 0.833807, valid loss: 1.201963, valid acc: 0.501465 


time taken: 1.3322148323059082 
epoch: 4, training loss: 0.246652, training acc: 0.902699, valid loss: 1.447271, valid acc: 0.502441 


time taken: 1.3261046409606934 
epoch: 5, training loss: 0.165571, training acc: 0.934067, valid loss: 1.933259, valid acc: 0.494141 
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break epoch:6 
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Evaluation 


Let look at our results. Once the model is trained we can feed the test data that we prepared 
earlier in this chapter and evaluate the predictions. In this case we will use few different metrics 
to evaluate our model which are: precision, recall, and F1 scores. 


To evaluate your model it is important to choose right kind of metrics and F1 scores are 
considered more practical as compared to the accuracy score. 


Key points to understand them in simple terms: 


e Accuracy: The number of correct predictions over the number of total examples that have 
been evaluated. 

e Precision: The higher this number is, the more you were able to pinpoint all positives 
correctly. If this is a low score, you predicted a lot of positives where there were none. 

e Recall: If this score is high, you didn’t miss a lot of positives. But as it gets lower, you are 
not predicting the positives that are actually there. 

e F1-score: This is the balanced harmonic mean of Recall and Precision, giving both metrics 
equal weight. The higher the F-Measure is, the better. 


logits = sess.run(model.logits, feed_dict={model.X:str_idx(test_X, dictionary, maxlen)}) 
print(metrics.classification_report(test_Y, np.argmax(logits,1), target_names = trainset.target_names) ) 


OUTPUT : 
precision recall f1-score support 
negative 0.64 0.74 0.69 1080 
positive 0.68 0.58 0.63 1053 
avg / total 0.66 0.66 0.66 2133 


So here we can see that our average F1 score is 66% while using basic RNN cells. Let's see if 
this can be improved over by using other variations of RNNs. 
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LSTM Architecture 


The drawbacks of basic RNN pushed researchers to develop a new variant of the RNN model, 
called Long Short Term Memory (LSTM). LSTM advantages arise because it uses gates to 
control the memorizing process. The following diagram shows an LSTM cell: 





Figure 5.x : An LSTM unit (source : http://colah.github.io/posts/2015-08-Understanding- 
LSTMs) 


LSTM consist of three main components labeled as 1,2,3 in the above diagram: 


1. The forget gate f(t): LSTM has a special architecture which enables it to forget the 
unnecessary information. The sigmoid layer takes the input X(t) and h(t-1) and decides 
which parts from old output should be removed (by outputting a 0). The output of this 
gate is f(t)*c(t-1). 

2. The next step is to decide and store information from the new input X(t) in the cell state. 
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A Sigmoid layer decides which of the new information should be updated or ignored. 
A tanh layer creates a vector of all the possible values from the new input. These two are 
multiplied to update the new cell state. This new memory is then added to old memory 
cC(t-1) to give c(t). 

3. Finally, we need to decide what we’re going to output. A sigmoid layer decides which 
parts of the cell state we are going to output. Then, we put the cell state through 
a tanh generating all the possible values and multiply it by the output of the sigmoid gate, 
so that we only output the parts we decided to. 


So in these three steps, a LSTM cell in our model learns what information to store in long term 
memory and what to get rid of. 
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Implementing LSTM Model 


The process what we performed to build the basic RNN model will remain the same except the 
model definition part. So let's implement that and check the performance of the new model. 


Code file for the model can be viewed at https://github.com/PacktPublishing/Python-Deep- 
Learning-Projects/blob/master/Chapter%205/2.%20rnn_Istm.py 
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Defining LSTM model 


Again most of the code will remain same only the major change will be to use 
tf.nn.rnn_cell.Lstmcel1() instead of tf.nn.rnn_cell.BasicRNNCell(). While initializing the LSTM cell 
we are using an orthogonal initialized which will generate a random orthogonal matrix which is 
an effective way of combating exploding and vanishing gradients. 


class Model: 
def __init__(self, size_layer, num_layers, embedded_size, 
dict_size, dimension_output, learning_rate): 


def cells(reuse=False) : 
return 
tf.nn.rnn_cell.LSTMCel1l(size_layer, initializer=tf.orthogonal_initializer(),reuse=reuse) 


self .X 
self .Y 


tf.placeholder(tf.int32, [None, None] ) 
tf.placeholder(tf.float32, [None, dimension_output] ) 


encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1)) 
encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self .X) 


rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]) 
outputs, _ = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded, dtype = tf.float32) 


W = tf.get_variable('w', shape=(size_layer, 
dimension_output), initializer=tf.orthogonal_initializer()) 
b = tf.get_variable('b', shape=(dimension_output),initializer=tf.zeros_initializer()) 


self.logits = tf.matmul(outputs[:, -1], W) + b 

self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = self.logits, labels 
= self.Y)) 

self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate) .minimize(self.cost) 





correct_pred = tf.equal(tf.argmax(self.logits, 1), tf.argmax(self.Y, 1)) 
self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32) ) 


So this what the architecture of the LSTM model looks like, which is the same as compared to 
the previous basic model except with the addition of the LSTM cell part. 
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Training 


Now lets train our model. 


EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 5, 0, 0, 0 
while True: 
lasttime = time.time() 
if CURRENT_CHECKPOINT == EARLY_STOPPING: 
print('break epoch:%d\n'%( EPOCH) ) 
break 


train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0 


for i in range(0, (len(train_X) // batch_size) * batch_size, batch_size): 


batch_x = str_idx(train_X[i:itbatch_size], dictionary, maxlen) 


acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], 
feed_dict = {model.X : batch_x, model.Y : train_onehot[i:i+batch_size]}) 


train_loss += loss 
train_acc += acc 


for i in range(0, (len(test_X) // batch_size) * batch_size, batch_size): 


batch_x = str_idx(test_X[i:itbatch_size], dictionary, maxlen) 
acc, loss = sess.run([model.accuracy, model.cost], 


feed_dict = {model.X : batch_x, model.Y : train_onehot[i:i+batch_size]}) 


test_loss += loss 
test_acc += acc 


train_loss /= (len(train_X) // batch_size) 
train_acc /= (len(train_X) // batch_size) 
test_loss /= (len(test_X) // batch_size) 
test_acc /= (len(test_X) // batch_size) 


if test_acc > CURRENT_ACC: 


print('epoch: %d, pass acc: %f, Current acc: %f'%(EPOCH,CURRENT_ACC, test_acc)) 


CURRENT_ACC = test_acc 

CURRENT_CHECKPOINT = 0 
else: 

CURRENT_CHECKPOINT += 1 


print('time taken:', time.time()-lasttime) 


print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\n'% 


(EPOCH, train_loss, 
train_acc, test_loss, 
EPOCH += 1 


OUTPUT: 
('time taken:', 18.061596155166626) 


epoch: 10, training loss: 0.015714, training acc: 0.994910, valid loss: 


('time taken:', 17.786305904388428) 


epoch: 11, training loss: 0.011198, training acc: 0.995975, valid loss: 


('time taken:', 19.031064987182617) 


epoch: 12, training loss: 0.009245, training acc: 0.996686, valid loss: 


('time taken:', 16.996762990951538) 


epoch: 13, training loss: 0.006528, training acc: 0.997751, valid loss: 


('time taken:', 17.008245944976807) 


epoch: 14, training loss: 0.011770, training acc: 0.995739, valid loss: 


break epoch:15 


4.252270, 


4.644272, 


4.575824, 


4.449901, 


4.282045, 


valid 


valid 


valid 


valid 


valid 


test_acc) ) 


acc: 


acc: 


acc: 


acc: 


acc: 


- 500000 


-502441 


499512 


-501953 


-499023 


You will notice that even after using the same configurations of the model, the training time 


required for the LSTM based model will be more. 
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Evaluation 


Now let again compute the metrics and compare the performance. 


logits = sess.run(model.logits, feed_dict={model.X:str_idx(test_X, dictionary, maxlen)}) 
print(metrics.classification_report(test_Y, np.argmax(logits,1), target_names = trainset.target_names) ) 


OUTPUT: 
precision recall f1-score support 
negative 0.75 QO. 71 0,73 1085 
positive Os 721 0.75 0.72 1048 
avg / total 0.73 0.73 O..73 2133 


So, we can clearly see the boost in the performance of the model! Now, with the LSTM the F1 
score is bumped to 73% whereas in our previous basic RNN model it was 66% which is quite a 
good improvement of 7%. 
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Sequence-to-Sequence model 


In this section, we'll implement a seq2seq model (an Encoder-Decoder RNN) based on the LSTM 
unit for a simple sequence to sequence question answer task. This model can be trained to map 
an input sequence (questions) to an output sequence (answers) which is not necessarily of the 
same length. 


This type of models has shown impressive performance in various other tasks such as speech 


recognition, machine translation, question answering, Neural Machine Translation and Image 
Caption generation. 


X Y Z <eos> 





A B C D <eos> X 7. Z 


In the encoder-decoder structure, one RNN (blue) encodes the input and a second RNN (red) 
calculates the target values. One essential step is to let the encoder and decoder communicate. In 
the simplest approach, you use the last hidden state of the encoder to initialize the decoder. 

Other approaches let the decoder attend to different parts of the encoded input at different 
timesteps in the decoding process. 


So let's do it and see how it performs. 


Link for the model can be found at:https://github.com/PacktPublishing/Python-Deep-Learning- 
Projects/blob/master/Chapter%205/3.%20rnn_lstm_seq2seq.py 
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Data Preparation 


Here we will build our question answering system. For the project, we need a dataset which is a 
question and answer pair as shown in the following image. Both the columns are the sequence of 
words which is what we need to feed into our seq2seq model. Also, you can note that we can have 
a dynamic length of the sentences: 
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Question 
hi 
good morning 
good afternoon 
good evening 
good night 
how are you 
how are you doing 
what is your name 
whats your name 
may i have your name 
are you a boy or a girl 
are you a man or a woman 
are you a woman or a man 
where do you live 
in which city do you live 
in which country do you live 
where are you 
where are you now 
why are we here 
whats up 
what a nice day it is 
another 


thats really a funny joke 


Answer 


hi there 

good morning 

good afternoon 

good evening 

good night have a nice dream 
i am fine thank you 


doing good thank you 


my name is papaya and what do you want me to call you dear sir 
or mada ... 


my name is papaya may i also have your name please 
sure my name is papaya may i also have your name please 
i am a boy 

i am still a boy 

i am a boy 

miami florida 

i live in miami florida 

i live in the united states 

currently i am in miami florida 

i am in miami florida 

we are here to communicate with each other 

not much 


yes it is 


why are ets eyes so big he saw the phone bill for phoning home 
no free ... 


i think youre making that joke 
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Let's load them and perform the same data processing using build_dataset(). In the end, we will 
have a dictionary with words as keys and the associated values are the counts of the word in the 
respective corpus. Also, we have 4 extras values that we talked about before in this chapter: 


import numpy as np 
import tensorflow as tf 
import collections 
from utils import * 


file_path = './conversation_data/' 


with open(file_path+'from.txt', 'r') as fopen: 
text_from = fopen.read().lower().split('\n') 
with open(file_path+'to.txt', 'r') as fopen: 
text_to = fopen.read().lower().split('\n') 
print('len from: %d, len to: %d'%(len(text_from), len(text_to))) 


concat_from = ' '.join(text_from).split() 

vocabulary_size_from = len(list(set(concat_from) )) 

data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, 
vocabulary_size_from) 


concat_to = ' '.join(text_to).split() 
vocabulary_size_to = len(list(set(concat_to) )) 
data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to) 


GO = dictionary_from['GO'] 


PAD = dictionary_from['PAD'] 
EOS = dictionary_from['E0OS'] 
UNK = dictionary_from['UNK' ] 
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Defining seq2seq model 


Below is the TensorFlow seq2seq model definition. We use an embedding layer to go from 
integer representation to vector representation of the input. So this seq2seq model has 4 major 
components such as embedding layer, encoders, decoders and cost/optimizers as you can see in 
the image: 
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encoder_e... 
decoder_e.. init 
encoder 
decoder 
logits 
+» | more 






ArgMax(0-7] 


Stensors 


sent 


logits init 


46 sensors 


Placehol... 
decoder init 


decoder_embedd... init encoder 


Sty, 
— 


encoder_embedd... init 


Placehol... Placehol... 


class Chatbot: 
def __init__(self, size_layer, num_layers, embedded_size, 


from_dict_size, to_dict_size, learning_rate, batch_size): 


def cells(reuse=False): 
return 


tf.nn.rnn_cell.LSTMCell1l(size_layer, initializer=tf.orthogonal_initializer(),reuse=reuse) 


self.X = tf.placeholder(tf.int32, [None, None]) 
self.Y = tf.placeholder(tf.int32, [None, None]) 
self .X_seq_len = tf.placeholder(tf.int32, [None]) 
self.Y_seq_len = tf.placeholder(tf.int32, [None]) 


with tf.variable_scope("encoder_embeddings"): 
encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1)) 
encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X) 
main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1]) 


with tf.variable_scope("decoder_embeddings"): 
decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1) 
decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1)) 
decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input ) 


with tf.variable_scope("encoder"): 
rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]) 
_, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded, 
dtype = tf.float32) 
with tf.variable_scope("decoder"): 
rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]) 
outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, 
initial_state = last_state, 
dtype = tf.float32) 
with tf.variable_scope("logits"): 
self.logits = tf.layers.dense(outputs, to_dict_size) 
print(self.logits) 
masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32) 
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with tf.variable_scope("cost"): 
self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits, 
targets = self.yY, 
weights = masks) 
with tf.variable_scope("optimizer"): 
self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost) 
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Hyperparameters 


Now that we have our model definition ready, we will define the hyperparameters. We will keep 
the most of the configurations same as compared to the previous one: 


size_layer = 128 
num_layers = 2 
embedded_size = 128 
learning_rate = 0.001 
batch_size = 32 

epoch = 50 
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Training 


Now, let's train the model. We will need some helper functions for padding of the sentence and 
to calculate the accuracy of the model. 


def pad_sentence_batch(sentence_batch, pad_int): 

padded_seqs = [] 

seq_lens = [] 

max_sentence_len = 50 

for sentence in sentence_batch: 
padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence) ) ) 
seq_lens.append(50) 

return padded_seqs, seq_lens 


def check_accuracy(logits, Y): 
acc = 0 
for i in range(logits.shape[0]): 
internal_acc = 0 
for k in range(len(Y[i])): 
if Y[i][k] == logits[i][k]: 
internal_acc += 1 
acc += (internal_acc / len(Y[i])) 
return acc / logits.shape[0] 


We initialize our model and iterate the session for the defined number of epochs. 


tf.reset_default_graph() 

sess = tf.InteractiveSession() 

model = Chatbot(size_layer, num_layers, embedded_size, vocabulary_size_from + 4, 
vocabulary_size_to + 4, learning_rate, batch_size) 

sess.run(tf.global_variables_initializer()) 


for i in range(epoch): 

total_loss, total_accuracy = 0, 0 

for k in range(0, (len(text_from) // batch_size) * batch_size, batch_size): 

batch_x, seq_x = pad_sentence_batch(X[k: k+batch_size], PAD) 

batch_y, seq_y = pad_sentence_batch(Y[k: k+batch_size], PAD) 

predicted, loss, _ = sess.run([tf.argmax(model.logits,2), model.cost, model.optimizer], 
feed_dict={model.X:batch_x, 

model.Y:batch_y, 

model.X_seq_len:seq_x, 

model.Y_seq_len:seq_y}) 

total_loss += loss 

total_accuracy += check_accuracy(predicted, batch_y) 

total_loss /= (len(text_from) // batch_size) 

total_accuracy /= (len(text_from) // batch_size) 

print('epoch: %d, avg loss: %f, avg accuracy: %f'%(it1, total_loss, total_accuracy) ) 








OUTPUT : 

epoch: 47, avg loss: 0.682934, avg accuracy: 0.000000 

epoch: 48, avg loss: 0.680367, avg accuracy: 0.000000 

epoch: 49, avg loss: 0.677882, avg accuracy: 0.000000 
0.678484, avg accuracy: 0.000000 


epoch: 50, avg loss: 


epoch: 1133, avg loss: 0.000464, avg accuracy: 1.000000 
epoch: 1134, avg loss: 0.000462, avg accuracy: 1.000000 
epoch: 1135, avg loss: 0.000460, avg accuracy: 1.000000 
epoch: 1136, avg loss: 0.000457, avg accuracy: 1.000000 
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Evaluation 


So after running the training process for few hours on GPU, you can see that the accuracy has 
reached the value 1.0 and loss has significantly reduced to 0.00045. Let's see how the model 
performs when we ask some generic questions. 


To make predictions we will create a predict() which will take the raw text of any size as input 
and returns the response to the question that we asked. We did a quick fix to handle the out of 
vocab (OOV) words by replacing them with the PAD. 


def predict(sentence): 
X_in = [] 
for word in sentence.split(): 
try: 
X_in.append(dictionary_from[word] ) 
except: 
X_in.append(PAD) 
pass 


test, seq_x = pad_sentence_batch([X_in], PAD) 
input_batch = np.zeros([batch_size, seq_x[0]]) 
input_batch[0] =test[0] 


log = sess.run(tf.argmax(model.logits,2), 
feed_dict={ 
model.X:input_batch, 
model.X_seq_len:seq_x, 
model. Y_seq_len:seq_x 
} 
) 


result=' '.join(rev_dictionary_to[i] for i in log[0]) 
return result 


When the model was trained for first 50 epochs, we had the following result: 


>> predict('where do you live') 
>> i PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 
PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 


>> print predict('how are you ?') 
>> i am PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 
PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 


When the model was trained for 1136 epochs: 


>> predict('where do you live') 
>> miami florida PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 
PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 


>> print predict('how are you ?') 


>> i am fine thank you PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 
PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 


Well!! That's impressive right. Now your model is not just able to understand the context but can 
also generate answers words by word. 
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Summary 


In this chapter, we covered the basic RNN cells, LSTM cells and seq2seq model to build the 
language model which can be used for multiple NLP tasks. We implemented a chatbot from 
scratch to answer the questions by generating the sequence of words from the provided dataset. 


The experience in this exercise demonstrates the value of LSTM as a often necessary component 
of the RNN. With the LSTM we were able to see improvements over past CNN models in that: 


1. The LSTM was able to preserve state information. 
2. The length of sentences for both inputs and outputs could be variable and different. 
3. The LSTM was able to adequately handle complex context. 


Specifically, we: 


. Gained an intuition about the RNN and its primary forms 

. Implemented a language model using RNN 

. Learned about the Long-Short Term Memory (LSTM) Model 

. Implemented the LSTM language model and compare it to the RNN 

. Implemented an Encoder-Decoder RNN based on the LSTM unit for a simple sequence to 
sequence question answer task 


UBWNR 


With the right training data, it would be possible to use this model to achieve the hypothetical 
client's goal (the Restaurant chain) of building a robust chatbot (in combination with other 
computational linguistic technologies we've explored) that could automate the over the phone 
food ordering process. 


Well Done! 
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Generative Language modeling for content 
creation 


This work is certainly getting exciting and the word is out that we're demonstrating a 
professional set of deep learning capabilities in producing solutions for a wide range of business 
use cases! As data scientists, we understand the transferability of our skills. We know we can 
provide value by employing core skills when working on problems that we know are similar in 
structure but that may seem different at first glance. This couldn't be more true than in the next 
Deep Learning Project where we're (hypothetically) going to be working on in which a creative 
group has asked us to help to produce original content for movie scripts, song lyrics and even 
music! 


How can we leverage our experience in solving problems for restaurant chains to such a different 
industry? Let's explore what we know and what we're going to be asked to do. In past projects 
we demonstrated that we could take an image as input and output a class label (Chapter 2), we 
trained a model to take inputs of text and output sentiment classifications (Chapter 3), we built a 
NLP Pipeline for an open domain question answering chatbot where we took text as inputs and 
identified text in a corpus to present as the appropriate output (Chapter 4), and we expanded that 
chatbot functionality to be able to serve a restaurant in an automated ordering system (Chapter 
5). 


Define the goal: 

In this next project, we're going to take the next step in our computational linguistics journey in 
Deep Learning Projects in Python and GENERATE new content for our client. We need to help 
them by providing a deep learning solution that generates new content that can be used in movie 
scrips, song lyrics, and music. 


In this chapter, we will implement a generative model which can generate content using the 
LSTM's, Variational autoencoders and GAN's. We will be implementing models for both text 
and images which can generate images and text for artists and various businesses. 


What we'll learn in this chapter is: 


1. Text generation with LSTM 

2. Additional power of a Bi-directional LSTM for text generation 
3. Deep (Multi-layer) LSTM to generate lyrics for a song 

4. Deep (Mulit-layer) LSTM to generate the music for a song 
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Text generation with LSTM 


In this section, we’ ll explore how recurrent neural networks can be used to generate sequence 
data. The universal way to generate sequence data in deep learning is to train a network (usually 
an RNN or a convnet) to predict the next token or next few tokens in a sequence, using the 
previous tokens as input. For instance, if we have a sequence of words as input "i love to work in 
deep learning", we will train the network to predict the target, the next character. 


When working with textual data, tokens are typically words or characters, and any network that 
can model the probability of the next token given the previous ones is called a language 

model which can capture the latent space of language. 

Once we have trained our language model, we can then proceed to feed some initial text and ask 
it to generate the next token, then add the generated token back into the language model to 
further predict next tokens. For our hypothetical use case, our creative client will use this model 
and later provide examples of text that we would then be asked to create novel content in that 
style. 

First, we will import all the modules required to build the text generative model. In this chapter, 
we will use a keras API's to build the models. We will use keras utils to download the dataset. 
To build a text generation modules we need lots of simple text data. 


Code Link : Python-Deep-Learning-Projects/Chapter 6/Basics/generative_text.py 


import keras 

import numpy as np 

from keras import layers 

# Gather data 

path = keras.utils.get_file( 
"sample.txt', 
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt') 

text = open(path).read().lower() 

print('Number of words in corpus:', len(text)) 


Data pre-processing 


Let's perform the data pre-processing to convert the raw data into the encoded form. We will 
extract fixed length sentences, encode them using one-hot encoding and then create a tensor of 
shape (sequence, maxlen, unique_characters) as shown in the figure below. Simultaneously, 
we'll also prepare the target vector y, which contains the corresponding next character that comes 
after each extracted sequence. 


# Length of extracted character sequences 
maxlen = 100 


# We sample a new sequence every 5 characters 
step = 5 


# List to hold extracted sequences 
sentences = [] 


# List to hold the target characters 
next_chars = [] 


# Extracting sentences and the next characters. 

for i in range(0, len(text) - maxlen, step): 
sentences.append(text[i: i + maxlen]) 
next_chars.append(text[i + maxlen]) 

print('Number of sequences:', len(sentences) ) 
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# List of unique characters in the corpus 
chars = sorted(list(set(text))) 


# Dictionary mapping unique characters to their index in “chars” 
char_indices = dict((char, chars.index(char)) for char in chars) 


np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) 
np.zeros((len(sentences), len(chars)), dtype=np.bool) 
i, sentence in enumerate(sentences): 
for t, char in enumerate(sentence): 
x[i, t, char_indices[char]] = 1 
y[i, char_indices[next_chars[i]]] = 1 


Converting characters into one-hot encoding. 
; 


# 
x 
y 
fo 
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preface 


Unique 
supposing that truth is a woman--what then? is there not ground characters 
for suspecting that all philosophers, in so far as they have been oY 
dogmatists, have failed to understand women--that the terrible Sequence length 


seriousn nd clumsy importunity with which they have usually paid 


their addresses to truth, have been unskilled and unseemly methods for 





winning a woman? certainly she has never allowed herself to be won: and 
at present every kind of dogma stands with sad and discouraged mien--if, 
indeed it stands at all! for there are scoffers who maintain that it 

has fallen, that all dogma lies on the ground--nay more, that it is at 

its last gasp. but to speak seriously, there are good grounds for hoping 
that all dogmatizing in philosophy, whatever solemn, whatever conclusive 
and decided airs it has assumed, may have been only a noble puerilism 
and tyronism:; and probably the time is at hand when it will be once 

and again understood what has actually sufficed for the basis of such 
imposing and absolute philosophical edifices as the dogmatists have 
hitherto reared: perhaps some popular superstition of immemorial time 
(such as the soul-superstition, which. in the form of subject- and 
ego-superstition, has not yet ceased doing mischiel): perhaps some 

play upon words, a deception on the part of grammar. or an 

audacious generalization of very restricted. very personal, very 
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human--all-too-human facts. the philosophy of the dogmatists, it is to 
be hoped, was only a promise for thousands of years afterwards, as was 
astrology in sull earlier times, in the service of which probably more 
labour 





old, acuteness, and patience have been spent than on any 
"super-terrestrial™ 





actual science hitherto: we owe to it, and to its 


Input X Output Y 


Defining Model 


This network is a one hidden LSTM layer with 128 memory units followed by a Dense classifier 
and softmax activation over all possible characters. Since our targets are one-hot encoded, we’|l 
use categorical_crossentropy as the loss to train the model. 


model = keras.models.Sequential() 
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars)))) 
model.add(layers.Dense(len(chars), activation='softmax')) 


optimizer = keras.optimizers.RMSprop(1lr=0.01) 
model.compile(loss='categorical_crossentropy', optimizer=optimizer ) 
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fess eee (None, 100, 53) 
stm_ |: 


(None, 128) 


nh (None, 128) 
ense_1: Dense 
(None, 53) 


Train Model 





When generating text, the way we choose the next character is crucially important. The most 
common way (greedy sampling) leads to repetitive characters that does not look like coherent 
language. This is why we will be using a different approach called stochastic sampling which 
adds some randomness to the probability distribution of the prediction. Here is the code we use 
to re-weight the original probability distribution coming out of the model and draw a character 
index from it (the sampling function). 


def sample(preds, temperature=1.0): 

preds = np.asarray(preds).astype('float64' ) 

preds = np.log(preds) / temperature 

exp_preds = np.exp(preds) 

preds = exp_preds / np.sum(exp_preds) 

probas = np.random.multinomial(1, preds, 1) 

return np.argmax(probas) 
Now, we repeatedly train and generate text. We begin training for 30 epochs and fit the model 
for 1 iteration. We randomly select a seed text then convert it into the one hot encoding format 
and perform predictions of 100 characters and append the newly generated character to the seed 
text in each iteration. 


Generation is performed on various ranges of different temperatures after every epoch. This 
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allows you to see how the generated text evolves as the model begins to converge, as well as the 
impact of temperature in the sampling strategy. 

Temperature is a hyper-parameter of LSTMs used to control the randomness of predictions by 
scaling the logits before applying softmax. 


for epoch in range(1, 30): 
print('epoch', epoch) 
# Fit the model for 1 epoch 
model.fit(x, y, batch_size=128, epochs=1, callbacks=callbacks_list) 


# Select a text seed randomly 

start_index = random.randint(0, len(text) - maxlen - 1) 
generated_text = text[start_index: start_index + maxlen] 
print('---Seeded text: "' + generated_text + '"') 


for temperature in [0.2, 0.5, 1.0, 1.2]: 
print('------ Selected temperature:', temperature) 
sys.stdout.write(generated_text) 


# We generate 100 characters 
for i in range(100): 
sampled = np.zeros((1, maxlen, len(chars))) 
for t, char in enumerate(generated_text): 
sampled[0, t, char_indices[char]] = 1. 


preds = model.predict(sampled, verbose=0) [0] 
next_index = sample(preds, temperature) 
next_char = chars[next_index] 


generated_text += next_char 
generated_text = generated_text[1: ] 


sys.stdout .write(next_char ) 
sys.stdout.flush() 
print() 


Inference and Results 


We used the code below to store and load the checkpoints into a binary file which stores all the 
weights 


from keras.callbacks import ModelCheckpoint 


filepath="weights-{epoch:02d}-{loss: .4f}.hdf5" 
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min' ) 
callbacks_list = [checkpoint] 


Below is the code to use the trained models and generate new text. 


seed_text = 'i want to generate new text after this ' 
print (seed_text) 


# load the network weights 

filename = "weights-30-1.545.hdf5" 

model.load_weights(filename) 
model.compile(loss='categorical_crossentropy', optimizer='adam' ) 


for temperature in [0.5]: 
print('------ temperature:', temperature) 
sys.stdout .write(seed_text) 


# We generate 400 characters 
for i in range(40): 
sampled = np.zeros((1, maxlen, len(chars))) 
for t, char in enumerate(seed_text): 
sampled[0, t, char_indices[char]] = 1. 


preds = model.predict(sampled, verbose=0) [0] 


next_index = sample(preds, temperature) 
next_char = chars[next_index] 
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seed_text += next_char 
seed_text = seed_text[1:] 


sys.stdout .write(next_char ) 
sys.stdout.flush() 
print() 


After the successful training of the model, here’s what we get at epoch 30, 


--- Generating with seed: 

the "good old time" to which it belongs, and as an expressio" 

were ee temperature: 0.2 

the "good old time" to which it belongs, and as an expression of the sense of the stronger and subli 
ween ee temperature: 0.5 

and as an expression of the sense of the stronger and sublication of possess and more spirit and in 
ween ee temperature: 1.0 

e stronger and sublication of possess and more spirit and instinge, and it: he ventlumentles, no dif 
were ee temperature: 1.2 

d more spirit and instinge, and it: he ventlumentles, no differific and does amongly domen--whete ac 


As we can see that with low-temperature values, the model is able to generate words which are 
more realistic and practical, whereas with higher temperatures, the generated text becomes more 
interesting, surprising, even creative; it sometimes invents completely new words that sound 
somewhat plausible. So the idea of using low temperature is more reasonable for the 

business use-cases where you need to be realistic and higher temperature values can be used in 
the more creative and artistic use-cases. 


A magic lies between the balance of learned structure and randomness which makes the 
generation interesting. 


Generate Lyrics using Deep (Multi-layer) LSTM 


Now that we have learned how to build basic LSTM models for text generation. Let's move one 
step further and build a multi-layer LSTM model for lyrics generation. The goal of this project is 
to generate completely new original lyrics inspired by the work of an arbitrary number of artists. 


Let's begin. 
Code Link :_Python-Deep-Learning-Projects/Chapter 6/Lyrics-ai/ 


Data pre-processing 


So to build such a model which can generate lyrics we will need a huge amount of lyrics data, 
which can easily be extracted from various sources. We collected some 10k songs lyrics and 
stored in a text file called 1yrics_data.txt . You can find the data file in the repository. 


Now that we have our data ready, we need to convert this raw text into the one hot encoding. 


import numpy as np 
import codecs 


# Class to perform all preprocessing operations 
class Preprocessing: 
vocabulary = {} 
binary_vocabulary = {} 
char_lookup = {} 
size = 0 
separator = '->' 
# This will take the data file and convert data into one hot encoding and dump the vocab into the file. 
def generate(self, input_file_path): 
input_file = codecs.open(input_file_path, 'r', ‘utf_8') 
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index = 0 
for line in input_file: 
for char in line: 
if char not in self.vocabulary: 

self.vocabulary[char] = index 
self.char_lookup[index] = char 
index += 1 

input_file.close() 

self .set_vocabulary_size() 

self.create_binary_representation( ) 


# This method is to load the vocab into the memory 
def retrieve(self, input_file_path): 
input_file = codecs.open(input_file_path, 'r', ‘utf_8') 
buffer = "" 
for line in input_file: 
try: 
separator_position = len(buffer) + line.index(self.separator) 
buffer += line 
key = buffer[:separator_position] 
value = buffer[separator_position + len(self.separator):] 
value = np.fromstring(value, sep=',') 
self.binary_vocabulary[key] = value 
self.vocabulary[key] = np.where(value == 1)[0][0] 
self.char_lookup[np.where(value == 1)[0][0]] = key 


buffer = "" 
except ValueError: 
buffer += line 
input_file.close() 
self .set_vocabulary_size() 


# Below are some helper functions to perform pre-processing. 
def create_binary_representation(self): 
for key, value in self.vocabulary.iteritems(): 
binary = np.zeros(self.size) 
binary[value] = 1 
self.binary_vocabulary[key] = binary 


def set_vocabulary_size(self): 
self.size = len(self.vocabulary) 
print "Vocabulary size: {}".format(self.size) 


def get_serialized_binary_representation(self): 
string = "" 
np.set_printoptions(threshold='nan' ) 
for key, value in self.binary_vocabulary.iteritems(): 
array_as_string = np.array2string(value, separator=',', max_line_width=self.size * 
self.size) 
string += "{}{}{}\n".format(key.encode('utf-8'), self.separator, 
array_as_string[1:len(array_as_string) - 1]) 
return string 


So the overall objective of the pre-processing module is to convert the raw text data into the one- 
hot encoding as shown in the figure below. After the successful execution of the pre-processing 
module, a binary file will be dumped as {dataset_filename}.vocab. This vocab file is one of the 
mandatory files which needs to be fed into the model during the training process along with the 
dataset. 
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Vocabulary 


Key A Type Size Value 


int 1 18 
int |1 2B 

! int 1 55 

” int 1 48 

* int 1° 74 

& int 1 80 

E int 1 14 

( int 1 49 

) int 1 50 

* int |1 Ee 

: int 1 26 

= int 1 45 

int 1 42 
/ int 1 59 
Key A Tj Size Valu 

If only you were here to hold my head above bloody waters ay ui = 
We could make it through e int 1° 72 1 Float6s (83,) array([@., Q.) Ory s++y Qs, Qe, QI) 
You turned your back on me and now I do the same to you 
You made me feel so hollow 1 int [1 63 cava e 
How can the man that you looked up to turn around and say three words that hurt| 2 Float |(03,) RS A amet eee 
You don't know what it feels Like to be in my shoes 
You turned your back on me as I stood there and fell to the floor g int |? 2 flostes | (83) aerate meee ee 
This time there's no reason for me to turn around and scream for help j 
But if only you were here to hold my head above all those things you said to 2 ant |? 4 floated (83,) array([@., 0., @-, --., ., O., 01) 
I'm looking for forgiveness for something that I didn't even do 
How can you call yourself and idol, a person to trust? Pp 4 int |2 5 floated (83,)  array([@., 0., @., -.-, 0., 0., 0.1) 
How can I look so high up when you're always holding me down? 

5 int 1 70 6 Floatéd (83,)  array([O., 0+, Oey see) Os, Oe, Ql) 
Always the one with a face on 6 int 1 66 7 float64 (83,) array([@., @., O., «.+, O., @., @]) 
Always the one with never enough 
He was the boy that broke your trust 7 int 1° 73 8 Floatéd (83,) — array([0., 0. Oc, +++) Or, O., QI) 
Because he was never around enough 
We sit around and hear these stories 8 int 1 65 9 floaté4 (83,) array([@., @., @., -.., 0, 0., J) 


Wishing we were somewhere else 


Hose teste Ciiat 2) coud ina ieMie words (to) cave sts, Floatés (83,) array([@., 0.5 Q4, ++, Qs, Q-, QI) 


My so called friend 
My so called friend Key & Type Size Value ; floaté4 (83,) array([@., 
When will we see you again 





® unicode 1 A ? floated (83,)  array([@., 
I was the one with the face on 
I was the one with never enough 1 unicode 12 1 A float64 (83,) array([1., 
I was the boy that broke your trust 
Because I was never around enough 2 unicode 1 B float64 (83,) array([0., 
I sit around and tell these stories 
Wishing I was somewhere else 3 unicode 1 * 
Oh how I wish I found the words to save this & flestes | (85) ee 
4 inicode = 1 h 
My so called friend k D float64 (83,) array([0., 
My so called friend 
2 5 unicode 1 (€ E floaté4 (83,) array([@., 
I wish I found the words to save this 
I'm pushing it out, out G unicade |) F floats4 (83,)  array([0., 
Find the words to save this 
I'm pushing it out, out, out 7 unicode |1 i 6 floated (83,)  array([@., 
My so called friend 8 unicode 1 H float64 (83,) array([®., 0., O., ..., @., @., @]) 
My so called friend 
When will we see you again 9 unicode 1 0 I float64 (83,) array([0., @., O., ».., 0, 0., @1) 


My so called friend 10 nicedes (ae 
My so called friend J float64 (83,) array([0., @., 0., -.-, @., @, 8.1) 


When will we see you again 
Wicpheivinetsesh cust u unicode 1 5 kK floaté4 (83,) array([0., 0., O., «++, Q., 0., 01) 





Lyrics_data 2 (eee heeaa One-hot encoding 
23 unicode 1 v 
“4 unicode 1 | 
Bb unicode 1 u 
16 unicode 1 k 
W unicode 1 w 
18 unicode 1 
19 unicode 1 Ss 
20 unicode 1 * 
21 unicode 1 ? 
22 unicode 1 B 


Character mapping 


Define Model 


We will be using a different approach to build this model. To build a more complex model, we 
will use Tensorflow to write each layer from scratch. For this model, we have 2 placeholders 
which will store the input and output values. 


import tensorflow as tf 
import pickle 
from tensorflow.contrib import rnn 


def build(self, input_number, sequence_length, layers_number, units_number, output_number): 
self.x = tf.placeholder("float", [None, sequence_length, input_number] ) 
self.y = tf.placeholder("float", [None, output_number] ) 
self.sequence_length = sequence_length 


Then we create the variables to store weights and bias. 


self.weights = { 
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‘out': tf.Variable(tf.random_normal([units_number, output_number ]) ) 


self.biases = { 
‘out': tf.Variable(tf.random_normal([output_number]) ) 


} 

x = tf.transpose(self.x, [1, 0, 2]) 
x = tf.reshape(x, [-1, input_number]) 
x = tf.split(x, sequence_length, 0) 


This model is created using multiple LSTM layers with the basic LSTM cells assigning each 
layer with the defined number of cells as show in figure below. 


lstm_layers = [] 

for i in range(0, layers_number): 
lstm_layer = rnn.BasicLSTMCell(units_number ) 
lstm_layers.append(lstm_layer) 


deep_lstm = rnn.MultiRNNCell(1stm_layers) 
self.outputs, states = rnn.static_rnn(deep_lstm, x, dtype=tf.float32) 
print "Build model with input_number: {}, sequence_length: {}, layers_number: {}, " \ 


"units_number: {}, output_number: {}".format(input_number, sequence_length, layers_number, 
units_number, output_number ) 


# This method is using to dump the model configurations 


self.save(input_number, sequence_length, layers_number, units_number, output_number ) 
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Train 


Provided with the mandatory inputs as the dataset file path, vocab file path, and the model name, 
we will initiate the training process. Let's define all the hyperparameters for the model. 


import os 

import argparse 

from modules.Model import * 
from modules.Batch import * 


def main(): 


parser = argparse.ArgumentParser () 
parser.add_argument('--training_file', type=str, required=True) 
parser.add_argument('--vocabulary_file', type=str, required=True) 
parser.add_argument('--model_name', type=str, required=True) 
parser.add_argument('--epoch', type=int, default=200) 
parser.add_argument('--batch_size', type=int, default=50) 
parser.add_argument('--sequence_length', type=int, default=50) 
parser.add_argument('--log_frequency', type=int, default=100) 
parser.add_argument('--learning_rate', type=int, default=0.002) 


parser.add_argument('--units_number', type=int, default=128) 


151 


parser .add_argument('--layers_number', type=int, default=2) 
args = parser.parse_args() 


Since we are training the model in batch fashion, so too we divide the dataset into batches of 
defined batch_size using the patch module. 


batch = Batch(training_file, vocabulary_file, batch_size, sequence_length) 


Each batch will return 2 arrays one will be the input vector of input sequence which will have the 
shape as [batch_size, sequence_len , vocab_size] and the other array will be the label vector 
which will have the shape as [batch_size, vocab_size]. 


Now we initialize our model and create the optimizer function. In this model, we used the Adam 
optimizer. Then we train our model and perform the optimization over each batch, 


model = Model(model_name) 
model. build(input_number, sequence_length, layers_number, units_number, classes_number ) 
classifier = model.get_classifier() 


cost = tf.reduce_mean(tf.square(classifier - model.y)) 
optimizer = tf.train.AdamOptimizer (learning_rate=learning_rate) .minimize(cost) 


##----SKIPPING FEW LINES---- ## 


with tf.Session() as sess: 
sess.run(init) 
iteration = 0 


while batch.dataset_full_passes < epoch: 
iteration += 1 
batch_x, batch_y = batch.get_next_batch() 
batch_x = batch_x.reshape((batch_size, sequence_length, input_number) ) 


sess.run(optimizer, feed_dict={model.x: batch_x, model.y: batch_y}) 
if iteration % log_frequency == 0: 

acc = sess.run(accuracy, feed_dict={model.x: batch_x, model.y: batch_y}) 

loss = sess.run(cost, feed_dict={model.x: batch_x, model.y: batch_y}) 

print("Iteration {}, batch loss: {:.6f}, training accuracy: {:.5f}".format(iteration * 

batch_size, 
loss, acc)) 
batch.clean() 


Once the model gets trained, the checkpoints get stored which we can use later on for 
inferencing. Here is the snap of the accuracy and the loss while training process. 
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accuracy 


loss 


Inference 


Once the model is ready we can now use it to make the predictions. We will start by defining all 
the parameters. While building inference we need to provide some seed text as we did in the 
previous model. Along with that, we will also provide the path of the vocab file and the output 
file in which we will store the generated lyrics. Also, we will provide the length of text that we 
need to generate. 


import argparse 

import codecs 

from modules.Model import * 

from modules.Preprocessing import * 
from collections import deque 


def main(): 
parser = argparse.ArgumentParser () 
parser.add_argument('--model_name', type=str, required=True) 
parser.add_argument('--vocabulary_file', type=str, required=True) 
parser.add_argument('--output_file', type=str, required=True) 


parser.add_argument('--seed', type=str, default="Yeah, oho ") 
parser.add_argument('--sample_length', type=int, default=1500) 
parser.add_argument('--log_frequency', type=int, default=100) 


Now we will load the model by providing the model name which we used while training, along 
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with model we also restore the vocabulary from the file. 


model = Model(model_name) 
model.restore() 
classifier = model.get_classifier() 


vocabulary = Preprocessing() 
vocabulary.retrieve(vocabulary_file) 


We will be using the stack methods to store the generated characters append the stack and then 
use the same stack to feed into the model in the interactive fashion. 


for char in seed: 
if char not in vocabulary.vocabulary: 
print char,"is not in vocabulary file" 
char = u' '' 
stack.append(char) 
sample_file.write(char) 


with tf.Session() as sess: 
tf.global_variables_initializer().run() 


saver = tf.train.Saver(tf.global_variables()) 
ckpt = tf.train.get_checkpoint_state(model_name) 


if ckpt and ckpt.model_checkpoint_path: 
saver.restore(sess, ckpt.model_checkpoint_path) 


for i in range(0, sample_length): 
vector = [] 
for char in stack: 
vector .append(vocabulary.binary_vocabulary[char] ) 
vector = np.array([vector]) 
prediction = sess.run(classifier, feed_dict={model.x: vector}) 
predicted_char = vocabulary.char_lookup[np.argmax(prediction) ] 


stack.popleft() 
stack.append(predicted_char ) 
sample_file.write(predicted_char) 


if i % log_frequency == 0: 
print "Progress: {}%".format((i * 100) / sample_length) 


sample_file.close() 
print "Sample saved in {}".format(output_file) 


Output 


So after successful execution, we will get our own freshly brewed AI generated lyrics to be 
reviewed and published. Here is one sample of such lyrics. We have modified some of the 
spellings so that the sentence can make sense. 


Yeah, oho once upon a time, on ir intasd 


I got monk that wear your good 
So heard me down in my clipp 


Cure me out brick 
Coway got baby, I wanna sheart in faic 


I could sink awlrook and heart your all feeling in the firing of to the still hild, gavelly mind, have 
before you, their lead 

Oh, oh shor,s sheld be you und make 

Oh, fseh where sufl gone for the runtome 

Weaaabe the ligavus I feed themust of hear 


Here we can see that model has learned the way it has generated the paragraphs and sentences 
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with apt spacings. It still lacks perfection and also doesn't makes sense. 


See Signs of Success: The first task is to create a model that can learn, then the second one is to 
improve on that model. This can be obtained by training the model with a larger training dataset 
and longer training durations. 


Generate Music using Multi-layer LSTM 


Now that we have learned to generate lyrics, let's jump to the more creative part and further 
create some music using multi-layers of LSTM as shown in the figure below. Since by now, we 
know that RNN's are good for sequential data and we can also represent a music track as a 
sequence of notes or chords. Note objects contain information about the pitch, octave, 

and offset of the Note. And Chord objects are essentially a container for a set of notes that are 
played at the same time. 


os 


+j--}- 





Pitch refers to the frequency of the sound, or how high or low it is and is represented with the 
letters [A, B, C, D, E, F, G], with A being the highest and G being the lowest. 


Octave refers to which set of pitches you use on an instrument. 
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Offset refers to where the note is located in the piece. 


So in this section, we will learn how to generate music by first processing the sound files, 
converting them into the sequential mapping data and then use the RNN to train the model. 


So let's do it. 
Code Link : Python-Deep-Learning-Projects/Chapter 6/Music-ai/ 


Data Pre-processing 


To generate music we will need lots of music files. We will use them to extract sequences and 
build our training dataset. To simplify the process, in this chapter we are using the soundtrack of 
a single instrument. We collected some melodies which are stored in the MIDI files. Here is a 
sample of a midi file. 


Note Quarter Length by Pitch 


Pitch 





a os 
0 10 20 w 10 ty oo 10 80 
Offset 


So we can see that we have various notes and intervals between them. We will load the data from 
midi files into an array as can be seen in the code snippet below: 

We will use Music21 to extract the contents of our dataset and to take the output of the neural 
network and translate it into musical notation. Music21 is a Python toolkit used for computer- 
aided musicology. 

We start by loading each file into a Music21 stream object using 

the converter.parse(file) function. Using that stream object we get a list of all the notes and 
chords in the file. We append the pitch of every note object using its string notation since the 
most significant parts of the note can be recreated using the string notation of the pitch. And we 
append every chord by encoding the id of every note in the chord together into a single string, 
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with each note being separated by a dot. These encodings allow us to easily decode the output 
generated by the network into the correct notes and chords. 


from music21 import converter, instrument, note, chord 
import glob 


notes = [] 


for file in glob.glob("/data/*.mid"): 
midi = converter.parse(file) 
notes_to_parse = None 
parts = instrument.partitionByInstrument (midi) 
if parts: # file has instrument parts 
notes_to_parse = parts.parts[0].recurse() 
else: # file has notes in a flat structure 
notes_to_parse = midi.flat.notes 
for element in notes_to_parse: 
if isinstance(element, note.Note): 
notes.append(str(element.pitch) ) 
elif isinstance(element, chord.Chord): 
notes.append('.'.join(str(n) for n in element.normalOrder ) ) 
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Finalfantasy6fanf fortresscondor.mi 


arecomplete.mid 


Kingdom_Hearts_ 


Dearly_...loved.mid Travers...Town.mid 


redwings.mid 
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relmstheme- 
piano.mid 
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c 


‘Q.4', 
"@.4', 
'@.4', 
'C6', 
i Choa 
cOs4usin, 
16.950"; 


"0.5", 
'9.4', 
"9.5", 
0.4", 
10.4277 
=DSiny 


'9.5', 
“Was 


'Q.4', 
*@.5', 


'0.4', 
'0.4', 
'9.5', '0.4', '@.4', 
9.4", '9.5', '@.5', 
0.4.7", '0.4.7', 
'6.9.0', '6.9.0', 
'G5', '0.3.7', '@.3.7', 
'E-5', '5.9.0', '5.9.0', '5.9.0', 
'F5', '5.9.0', 'G5', '3.7.10', "3.7.10", 
"3.7.10', '3.7.10', "AS5', '5.9.0', '5.9.0', 
"5.9.0", 'F5', '5.9.0', ‘G5’, '3.7.10', 
ey A Ue Bb AO Be 
'5.9.0'], dtype='|S6') 


'@.3.7', 





*3.7.10', 
'5.9.0', 


Chords 
'9.2', 
'9.4', 
Te 
'Q.4', 
‘B44: 
'6.9.0', 
19.3.7', 
'E-5', 
'3.7.10', 
one, 


"C50. 
*@.2', 
ae 


'9.2', 
T 
T 
'@.4', '0.2', 
TY Oy De oe 
'6.9.0', '6.9.0', 
'O:3.7"% °0:3.7"; 
'5,9.0', '5.9.0', 
'C6', '3.7.10', 
'5,9.0', 

iy Oe 
'5.9.0', '5.9.0', 


'F5', 


Next, we have to create input sequences for the network and their respective outputs as shown in 
the figure below. The output for each input sequence will be the first note or chord that comes 
after the sequence of notes in the input sequence in our list of notes. The final step in preparing 
the data for the network is to normalize the input and one-hot encode the output. 
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sequence_length = 100 
# get all pitch names 
pitchnames = sorted(set(item for item in notes)) 


# create a dictionary to map pitches to integers 

note_to_int = dict((note, number) for number, note in enumerate(pitchnames) ) 

network_input = [] 

network_output = [] 

# create input sequences and the corresponding outputs 

for i in range(0, len(notes) - sequence_length, 1): 
sequence_in = notes[i:i + sequence_length] 
sequence_out = notes[i + sequence_length] 
network_input.append([note_to_int[char] for char in sequence_in]) 
network_output.append(note_to_int[sequence_out]) 

n_patterns = len(network_input ) 

# reshape the input into a format compatible with LSTM layers 

network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1)) 

# normalize input 

network_input = network_input / float(n_vocab) 

network_output = np_utils.to_categorical(network_output ) 
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Lookup table : notes <->integer 


Value 
5.9 int 
5.9.8 int 
6.10.1 int 
6.9.8 int 
7.10 int 
7.10.2 int 
7.11.0 int 


7.11.2 int 
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X : Network input Y : Network Output 
[None x sequence_length] [None x 1] 


Define model and Train 


Finally, we get to design the model architecture. In our model we use four different types of 
layers: 


LSTM layers is an RNN layer. 


Dropout layers are a regularisation technique which prevents overfitting of the model by 
randomly dropping the factions of nodes. 


Dense layers or fully connected layers is a fully connected neural network layer where 
each input node is connected to each output node. 


The Activation layer determines what activation function our neural network will use to 
calculate the output of a node. 


To make the quick implementation we will again use the Keras APIs. 


model 
model 
2 
i 
: 


)) 


model. 
model. 
model. 
model. 
model. 
model. 
model. 
model. 
model. 


= Sequential() 

.add(LSTM( 

56, 

nput_shape=(network_input.shape[1], network_input.shape[2]), 
eturn_sequences=True 


add(Dropout (0.5) ) 

add(LSTM(512, return_sequences=True) ) 
add(Dropout (0.3) ) 

add(LSTM(256) ) 

add(Dense(256) ) 

add(Dropout (0.3) ) 

add(Dense(n_vocab) ) 
add(Activation('softmax' )) 
compile(loss='categorical_crossentropy', 
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optimizer='rmsprop', 
metrics=['accuracy']) 


So here we designed an architecture consist of three LSTM layers, three Dropout layers, two 
Dense layers and one activation layer as shown in the figure below. To calculate the loss for each 
iteration of the training we will be using categorical cross entropy. And to optimize our network 
we will use Adam optimizer. Once we have our model architecture ready, its time to train the 
model. We will train the model for 200 epoch each with 25 batches using model.fit(). We also 
used the checkpoint to track the loss over each epoch. 


161 


Input Nodes 


eT (None, 100.1) 
aed (None, LOO, 256) 


ren (None, 100, 256) 
t_1l: Dropout 
cia ia (None, 100, 256) 


ora Si (None, LOO, 256) 
isiatk (None, 100, 512) 


ai (None, 100, 512) 
dro t_2: Dropout 
— si | output: | (None, 100, 512) 


input: (None, LOO, 512) 
tsm_3: STM [P| 
SS ae 


dense_I: Dense ions, 259) 
_ (None, 256 


dropout_3: Dropout | icspet: | = ) 
(None, 256) 


(None, 256) 
dense_2: Dense 
(None, 359) 


input: (None, 359 
activation_I: Activation ) 
Foun | eNone, 358) 


162 


Figure 6.x Architecture of the model. 


filepath = "weights-{epoch:02d}-{loss: .4f}.hdf5" 
checkpoint = ModelCheckpoint ( 

filepath, 

monitor='loss', 

verbose=0, 

save_best_only=True, 

mode='min' 


) 
callbacks_list = [checkpoint] 


history = model.fit(network_input, network_output, epochs=200, batch_size=64, callbacks=callbacks_list) 


Once the training process is completed, we will get a weight file which we will use later on to 
generate music. The performance of the model can be seen below. 
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Generating music 


Now its time for the real fun, we are going to generate some instrumental music. For this, we 
will reuse the code from the training section to prepare the data and set up the network model in 
the same way as before. Except, that instead of training the network we load the weights that we 
saved during the training section into the model. 


model = Sequential() 
model. add(LSTM( 
512, 
input_shape=(network_input.shape[1], network_input.shape[2]), 
return_sequences=True 
)) 
model.add(Dropout(0.5)) 
model.add(LSTM(512, return_sequences=True) ) 
model.add(Dropout(0.3)) 
model.add(LSTM(512) ) 
model.add(Dense( 256) ) 
model.add(Dropout(0.3)) 
model.add(Dense(n_vocab) ) 
model.add(Activation('softmax')) 
model.compile(loss='categorical_crossentropy', optimizer='adam' ) 


# Load the weights to each node 
model.load_weights('weights_file.hdf5' ) 


So we created the same model again for prediction purpose and added one extra line of code to 
load the weights into the memory. 


163 


Since we need a seed input for our model to start generating the music we are using a random 
note sequence from our processed files. You can also send your own nodes but always make sure 
that you have the sequence of length 100. 


# Randomly selected a note from our processed data 
start = numpy.random.randint(0, len(network_input) -1) 
pattern = network_input[start] 


int_to_note = dict((number, note) for number, note in enumerate(pitchnames) ) 
prediction_output = [] 


# Generate 1000 notes of music 
for note_index in range(1000): 

prediction_input = numpy.reshape(pattern, (1, len(pattern), 1)) 
prediction_input = prediction_input / float(n_vocab) 


prediction = model.predict(prediction_input, verbose=0) 


index = numpy.argmax(prediction) 
result = int_to_note[index] 
prediction_output.append(result) 


pattern. append(index) 
pattern = pattern[1:len(pattern) ] 


We iterated the generation for 1000 times which created a 1000 notes using the network that is 
roughly 5 mins of melody music. The idea that we used to select our next sequences for each 
iteration was, the first sequence we submit is the sequence of notes at the starting index. For 
every subsequent sequence that we use as input, we will remove the first note of the sequence 
and insert the output of the previous iteration at the end of the sequence. Which is a very crude 
way to do it and is also known as sliding window approach. But you can play around and add 
some randomness to each sequence we select which may give more creativity to the generated 
music. 


Now that we have all the encoded representations of the notes and chords in an array we can start 
decoding them and creating an array of Note and Chord objects. To do so you can make use of 
this helper function. This function is responsible to determine if the output is Note or a Chord. 


If the pattern is a note Chord, it will split the string up into an array of notes. Then loop through 
the string representation of each note and create a Note object for each of them. Then it can 
create a Chord object containing each of these notes. 


If the pattern is a Note, it creates a Note object using the string representation of the pitch 
contained in the pattern. 


At the end of each iteration, it increases the offset by 0.5, which can again be changed and 
randomness can be introduced to it. 


Finally, it uses the Music21 output stream object to create the midi file. 


Here are few samples of generated music [here]. 


def create_midi_file(prediction_output): 
"'™ convert the output from the prediction to notes and create a midi file""" 
offset = 0 
output_notes = [] 


for pattern in prediction_output: 
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# pattern is a chord 


if ('.' in pattern) or pattern.isdigit(): 
notes_in_chord = pattern.split('.') 
notes = [] 


for current_note in notes_in_chord: 
new_note = note.Note(int(current_note) ) 
new_note.storedInstrument = instrument .Piano() 
notes.append(new_note) 
new_chord = chord.Chord(notes) 
new_chord.offset = offset 
output_notes.append(new_chord) 
# pattern is a note 
else: 
new_note = note.Note(pattern) 
new_note.offset = offset 
new_note.storedInstrument = instrument .Piano() 
output_notes.append(new_note) 


# increase offset each iteration so that notes do not stack 
offset += 0.5 


midi_stream = stream.Stream(output_notes) 


midi_stream.write('midi', fp='generated.mid' ) 


Conclusion 


Wow, that's an impressive set of practical examples of using Deep Learning Projects in Python to 
build solutions in the creative space! Let's revisit the goals we set up for ourselves: 


Define the goal: In this project, we're going to take the next step in our computational linguistics 
journey in Deep Learning Projects in Python and GENERATE new content for our client. We 
need to help them by providing a deep learning solution that generates new content that can be 
used in movie scrips, song lyrics, and music. 


Deep Learning generated content for creative purposes is obviously very tricky. Our realistic 
goal in this chapter was to demonstrate and train you on the skills and architecture needed to get 
started on these types of projects. Producing acceptable results takes your interaction with the 
data, the model, the outputs AND testing with appropriate audiences. The key takeaway to 
remember is that the outputs of your models can be quite personalized to the task and to expand 
your thinking of what types of business use cases you should feel comfortable working on in 
your career. 


In this chapter, we implemented a generative model which generated content using the LSTM's. 
We implemented models for both text and audio that generated content for artists and various 
businesses in the creative space (hypothetically): the Music and Movie industries. 


What we learned in this chapter was: 


1. Text generation with LSTM 

2. The Additional power of a Bi-directional LSTM for text generation 
3. Deep (Multi-layer) LSTM to generate lyrics for a song 

4. Deep (Mulit-layer) LSTM to generate the music for a song 


Exciting work in Deep Learning and it keeps on coming in the next chapter...let's see what's in 
store! 
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Building Speech Recognition with 
DeepSpeech2 


Coming soon... 


Handwritten digits classification using 
ConvNets 


Coming soon... 
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Real-time Object Detection using OpenCV 
and TensorFlow 


Coming soon... 
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Building Face Recognition using OpenFace 
and Clustering 


Coming soon... 
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Semantic Labeling of an image using Pixel- 
Level clustering and Depth Layering 


Coming soon... 


Automated Image Captioning with 
NeuralTalk model 


Coming soon... 


Pose Estimation on 3D models using 
ConvNets 


Coming soon... 
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Image translation using GANs for style 
transfer 


Coming soon... 
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Develop an autonomous Agents with Deep R 
Learning 


Coming soon... 


174 


Next Steps in your Deep Learning Career 


Coming soon... 
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